GHX · 1 day ago
Senior Manager, Site Reliability Engineering (SRE)
Global Healthcare Exchange (GHX) is a healthcare business and data automation company that empowers healthcare organizations to enhance patient care and maximize savings. The Senior Manager, Site Reliability Engineering (SRE) will lead the SRE organization in delivering reliable, scalable, and resilient platforms and services, overseeing the strategy and implementation of a unified observability platform while driving a culture of reliability within the engineering teams.
Hospital & Health Care
Responsibilities
Hire, lead, and mentor a high-performing SRE team across geographies
Define and execute the SRE vision, roadmap, and strategy in alignment with business and engineering objectives
Establish a healthy 24x7 on-call model, ensuring coverage while promoting team well-being
Drive a blameless culture through structured postmortems and RCA follow-up actions
Build and manage a unified observability platform leveraging tools such as New Relic, Datadog, CloudWatch, Prometheus, Grafana, Graylog, and OpenTelemetry
Deliver holistic monitoring across infrastructure, applications, databases, APIs, and end-user experience
Implement APM (Application Performance Monitoring) to trace performance across distributed systems
Establish dashboards, metrics, and proactive alerting to identify anomalies early
Drive adoption of AIOps and predictive analytics for proactive reliability improvements
Define and manage SLIs, SLOs, SLAs, and Error Budgets across services
Partner with engineering teams to balance velocity with reliability, ensuring adherence to Error Budgets
Reduce MTTD (Mean Time to Detect) and MTTR (Mean Time to Resolve) through automation, faster detection, and better instrumentation
Perform capacity planning, scalability reviews, and resiliency testing
Lead major incident response, coordinating communications with executives and stakeholders
Drive root cause analysis (RCA) and implement long-term fixes
Partner with ITSM teams to align with incident, problem, and change management processes
Ensure continuous improvement loops from incidents back into observability, automation, and engineering practices
Collaborate with Engineering, Product, Security, Cloud, and DevOps teams to embed SRE practices
Provide guidance on instrumentation, reliability design, and operational readiness for new services
Partner with DBAs and data platform teams to monitor database health, replication, query performance, and failover readiness
Champion reliability as a shared responsibility across development and operations
Qualification
Required
12+ years of experience in SRE, Operations, or Infrastructure Engineering, with 5+ years in leadership roles
Proven expertise in unified observability, monitoring, and alerting across infra, apps, APM, and databases
Strong knowledge of observability tools: New Relic, Datadog, Prometheus, Grafana, Graylog, CloudWatch, OpenTelemetry, SolarWinds
Hands-on with incident response, RCA, MTTR/MTTD reduction, and on-call management
Deep understanding of SLIs, SLOs, SLAs, and Error Budgets
Strong AWS experience (EC2, ECS, EKS, networking, scaling groups)
Hands-on with containers & orchestration (Docker, Kubernetes)
Proficiency in Python, Java, C#, and shell scripting for automation
Knowledge of networking fundamentals, distributed systems, and high-availability architectures
Familiarity with ITIL/ITSM processes (incident, problem, change)
Strong leadership, stakeholder management, and communication skills
Preferred
Experience in large-scale SaaS or product-driven environments
Hands-on experience with databases: MongoDB, Elasticsearch, SQL Server, Oracle
Experience with chaos engineering, resiliency testing, and disaster recovery planning
Certifications: AWS Solutions Architect / DevOps Engineer, CKAD/CKA
Experience managing global SRE teams across time zones
Proven ability to embed reliability into engineering culture via SLOs and Error Budgets
Benefits
Health, vision, and dental insurance
Accident and life insurance
401k matching
Paid-time off
Education reimbursement
Company
GHX
GHX is a software-as-a-service company that’s reducing the cost of doing business in healthcare by automating supply chain processes and improving visibility into the products used in patient care.
H1B Sponsorship
GHX has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (5)
2024 (9)
2023 (9)
2022 (3)
2021 (13)
2020 (2)
Funding
Current Stage
Late StageRecent News
The Hans India
2024-02-11
2024-02-11
Company data provided by crunchbase