Staff Site Reliability Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Synthesis Health · 19 hours ago

Staff Site Reliability Engineer

Synthesis Health is a mission- and values-driven company focused on revolutionizing healthcare through innovation and collaboration. They are seeking a Staff Site Reliability Engineer to ensure platform availability and lead operational maturity, focusing on automation and disaster recovery strategies.

Health CareMedicalWellness

Responsibilities

Own the 99.99% Target: You will define the Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for our critical user journeys. You will be accountable for tracking our Error Budgets and governing the release velocity based on platform stability
Incident Management & Forensics: You will own the incident response process, serving as the ultimate escalation point for complex production outages. You will lead blameless post-mortems (RCAs) to identify root causes and ensure systemic fixes are implemented to prevent recurrence
Eliminate Toil: You will ruthlessly identify and automate manual operational tasks. Your goal is to engineer yourself out of operations work so you can focus on high-value reliability architecture
Architect for Catastrophe: You will design and implement our Business Continuity and Disaster Recovery strategy. You will orchestrate our regional failover capabilities, ensuring we meet aggressive Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO)
Enterprise-Grade Resilience: You will build the technical credibility required to win grueling enterprise audits. You will demonstrate that our platform is robust, stable, and resistant to unexpected failures through rigorous documentation and proof-of-concept demonstrations
"Game Day" Simulations: You will lead regular disaster recovery drills and chaos engineering experiments to validate our failover mechanisms, ensuring our team is practically prepared for real-world scenarios
Intelligent Auto-Scaling: You will design and implement sophisticated auto-scaling strategies (HPA/VPA/Cluster Autoscaler) on Kubernetes (GKE) to handle unpredictable spikes in medical data ingestion
Capacity Planning: You will lead capacity planning and cost optimization initiatives, ensuring our infrastructure scales efficiently with our business growth
Resilience Patterns: You will work with the Architecture Review Board (ARB) to enforce resilience patterns (circuit breakers, retries, fallbacks, bulkheads) in our application code and service mesh
Mentorship & Culture: You will advocate for SRE culture across the engineering organization, mentoring feature teams on how to build operable, observable, and reliable software

Qualification

Site Reliability EngineeringKubernetesInfrastructure as CodeObservabilityCloud NativeBC/DR OrchestrationCoding ProficiencyHealthcare ExperienceGlobal Traffic ManagementChaos EngineeringAutomationMentorship

Required

8+ years of engineering experience, with a significant focus on Site Reliability Engineering or DevOps in a high-scale, 24/7 production environment
Proven experience designing active-passive or active-active multi-region architectures
Successfully executed regional failovers and managed the complexities of data replication and consistency during outages
Deep, hands-on expertise with Kubernetes (GKE preferred)
Understanding of the internals of scheduling, networking (CNI), and storage (CSI)
Expert-level proficiency with Terraform or similar IaC tools
Deep experience implementing and tuning observability stacks (Prometheus, Grafana, Datadog, or similar)
Ability to extract meaningful signals from noise
Capable coder in Go, Python, or TypeScript
Experience diving into application code to debug production issues or build complex automation tooling
Deep experience with public cloud providers (GCP preferred) and their managed services

Preferred

Experience supporting HIPAA-compliant environments or handling PHI (Protected Health Information)
Experience with multi-region architectures, global load balancing, and CDN tuning
Experience designing and running chaos experiments to validate system resilience

Benefits

Medical
Dental
Vision
“Use as needed” vacation policy
Participation in our employee option program

Company

Synthesis Health

twittertwitter
company-logo
Synthesis Health provides diagnostic and interventional radiology services.

Funding

Current Stage
Growth Stage
Total Funding
$1.83M
2025-08-06Series Unknown· $1.83M
2023-01-01Seed

Leadership Team

leader-logo
Deepak Kaura
Chief Product Officer
linkedin
Company data provided by crunchbase