Manager, Site Reliability Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Avalon Healthcare Solutions · 1 month ago

Manager, Site Reliability Engineer

Avalon Healthcare Solutions is the nation’s leader in diagnostic intelligence, focused on transforming the role of diagnostic testing across the healthcare ecosystem. The Manager of Site Reliability Engineer will lead a team of SREs to ensure the availability, scalability, and performance of production and cloud infrastructure, driving operational excellence and enhancing cloud security.

AnalyticsHealth CareHospitalMedical
check
Culture & Values

Responsibilities

Lead, mentor, and grow a high-performing team of SREs through coaching, training, goal-setting, and performance feedback
Foster a culture of operational excellence and continuous improvement
Collaborate closely with developer, security, and product teams to balance reliability with feature delivery velocity
Own the end-to-end incident response process, including on-call management (PagerDuty), escalation handling, and RCA facilitation
Establish and enforce SLIs, SLOs, and error budgets in alignment with business priorities
Lead regular game days, failover tests, and resilience reviews
Implement and maintain end-to-end monitoring, alerting, and observability using tools such as CloudWatch, Prometheus, and Grafana
Drive visibility into system health, performance, and capacity trends through dashboards and metrics
Collaborate with development teams to optimize service performance and latency
Oversee infrastructure operations and deployment pipelines leveraging AWS (ECS/Fargate, EC2, Lambda, RDS, CloudFront, ALB/NLB)
Manage container orchestration, ensuring secure and efficient image management, scaling, and deployments
Advance Infrastructure as Code (IaC) practices using Terraform and CloudFormation
Drive automation across environment provisioning, CI/CD, and compliance checks
Partner with Security to implement and monitor AWS security controls (IAM least privilege, KMS, Secrets Manager, GuardDuty, Config, Security Hub, and Control Tower)
Ensure adherence to compliance frameworks (e.g., SOC 2, ISO 27001, HIPAA)
Conduct vulnerability remediation and security posture reviews across cloud and container environments
Define and report on SRE metrics (MTTR, MTBF, change failure rate, incident frequency)
Champion service reliability reviews and architecture improvements to reduce toil and improve resilience
Stay abreast of emerging AWS services and SRE best practices to evolve the platform

Qualification

AWSIncident responseInfrastructure as CodeMonitoringObservabilityContainer orchestrationSecurity best practicesCI/CDLeadershipCommunicationCross-functional collaboration

Required

7-10 Years in SRE, DevOps, or Infrastructure Engineering and 2+ years in a leadership or management role
Bachelor of Science (4 year) degree in a technical field such as engineering or computer science, or extensive relevant work experience
Strong background in AWS operations, containerized workloads, and modern observability stacks
Experience leading incident response programs and implementing operational runbooks
Proven track record of automating infrastructure and enforcing security best practices
Excellent communication and cross-functional leadership skills

Preferred

Exposure to financial or healthcare compliance and audit frameworks (PCI, SOC 2, ISO 27001, or HITRUST)
Familiarity with chaos engineering and capacity planning methodologies
Snowflake and data/ETL exposure

Company

Avalon Healthcare Solutions

twittertwitter
company-logo
Avalon is the world’s first Lab Insights company pioneering a new era of value-driven care by unlocking the potential of lab results.

Funding

Current Stage
Growth Stage
Total Funding
$54.27M
Key Investors
Francisco Partners
2018-01-01Debt Financing· $25M
2017-02-06Series C· $29.27M
2016-01-07Series B

Leadership Team

leader-logo
William (Bill) Kerr
Co-Founder & CEO
linkedin
Company data provided by crunchbase