PsiQuantum · 4 days ago
Site Reliability Engineer
PsiQuantum is focused on building the first useful quantum computers. The Site Reliability Engineer will manage the operation of monitoring stacks and ensure the reliability of services across both on-prem and AWS environments.
ComputerHardwareQuantum ComputingSemiconductorSoftware
Responsibilities
Define, implement, and iterate on Service Level Indicators & Service Level Objectives (SLIs/SLOs) and error budgets for critical services, with a focus on network reliability and data centre interconnects
Build and maintain Grafana dashboards that visualize golden signals (latency, traffic, errors, saturation), extending coverage to network telemetry such as packet loss, jitter, bandwidth utilization, and BGP/EVPN stability
Operate and tune the observability pipeline (Prometheus, Loki, Tempo) to ensure scalable, low-latency telemetry ingestion and alerting for networking as well as compute layers
Drive incident response: triage, mitigate, perform post-incident reviews, and implement preventive actions—particularly for network-related outages, congestion, or misconfigurations
Develop automation and self-service tooling in Python/Bash to streamline alerts, runbooks, and operational tasks, including network monitoring and diagnostics
Collaborate with Platform, Product, and Networking teams on capacity planning, performance testing, traffic engineering, and change management
Improve CI/CD health checks and release safety nets within GitLab, with attention to network dependencies in deployments
Contribute to Infrastructure as Code (Terraform, Ansible) for monitoring stack deployments and upgrades, including network observability tooling and configuration
Qualification
Required
Bachelor's Degree or higher in Computer Science, Engineering, or related technical field
5+ years in an SRE, DevOps, or Production Engineering role supporting distributed systems in production
Hands-on expertise with observability tools: Grafana, Prometheus, Loki, Tempo (or equivalent)
Proven track record designing dashboards and alerts around golden signals and USE/RED methodologies, extended to network utilization, saturation, and error metrics
Solid scripting/automation skills in Python and Bash; familiarity with GitLab CI pipelines
Operational experience with Kubernetes and containerized workloads
Strong working knowledge of AWS services, data centre networking fundamentals, routing protocols, load balancing, and network overlays (e.g., VXLAN/EVPN)
Experience running incident response and writing actionable post-mortems, including for network-related events
Familiarity with Infrastructure as Code (Terraform, Ansible) and configuration management
Strong communication and collaboration skills; comfortable acting as a generalist across infrastructure, networking, application, and data layers
Preferred
Exposure to regulated environments, multi-region networking architectures, and hybrid on-prem/cloud topologies is a plus
Benefits
Equity
Benefits
Company
PsiQuantum
PsiQuantum focuses on developing a scalable quantum computer using photonic qubits.
H1B Sponsorship
PsiQuantum has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (12)
2024 (10)
2023 (4)
2022 (7)
2021 (15)
2020 (5)
Funding
Current Stage
Late StageTotal Funding
$2.3BKey Investors
BlackRockAtomicoFounders Fund
2025-09-10Series E· $1B
2025-05-30Series Unknown· $22M
2024-11-05Secondary Market
Leadership Team
Peter Shadbolt
Co-founder and Chief Strategy Officer
Recent News
2026-01-06
redpoint.com
2026-01-05
globalventuring.com
2025-12-31
Company data provided by crunchbase