Insulet Corporation · 3 hours ago
Director, Site Reliability Engineering (Hybrid/Flexible)
Insulet Corporation is an innovative medical device company dedicated to simplifying life for people with diabetes. The Director of Site Reliability Engineering will provide strategic leadership and technical direction for the reliability, scalability, and performance of mission-critical systems and services, while partnering with various teams to ensure high-availability systems that meet business objectives.
Health CareMedicalMedical Device
Responsibilities
Provide strategic direction for the organization-wide adoption, evolution, and maturity of SRE principles, cultivating a culture centered on reliability, efficiency, and continuous improvement
Develop and oversee automation strategies, tools, and frameworks that improve system reliability, reduce operational toil, and enhance team productivity
Architect and evolve robust observability, monitoring, and alerting systems to ensure availability, performance, and real‑time operational insight
Lead and govern high‑severity incident response practices—ensuring rapid triage, thorough root cause analysis, and follow‑through on corrective and preventative actions
Analyze reliability, performance, and capacity metrics to drive proactive optimization and long‑term system resilience
Partner with engineering, product, and operations teams to embed SRE practices throughout the development lifecycle and influence architectural decisions for reliability
Build, mentor, and develop a high‑performing SRE organization, fostering technical excellence, career growth, and a strong culture of knowledge sharing
Oversee capacity planning, scalability assessments, and future‑state demand forecasting across critical systems
Establish and maintain comprehensive documentation of SRE processes, standards, frameworks, and best practices
Define the technical strategy and roadmap for SRE, including automation, reliability frameworks, tooling, monitoring, and operational best practices
Serve as final decision-maker during major incidents, including prioritization of remediation and long‑term reliability actions
Allocate resources, manage staffing decisions, and oversee budget planning for SRE initiatives
Establish, approve, and enforce service level objectives (SLOs), error budgets, and performance standards for systems and services
Drive process and operational improvements that enhance system reliability and organizational efficiency
Evaluate, select, and govern the adoption of third‑party tools and platforms related to observability, incident response, reliability testing, and automation
Define and approve training, development programs, and readiness standards for the SRE organization
Qualification
Required
Bachelor's in computer science, Engineering, or a related field
16 years of experience in the field including 6+ Site Reliability Engineering, DevOps, or a similar role
Proven experience architecting and managing highly available, scalable, and fault-tolerant systems
Ability to define a clear reliability vision and inspire teams and stakeholders toward long‑term reliability goals
Demonstrated sound judgment and calm decision‑making under pressure, particularly during high‑severity incidents
Strong people leadership skills, with experience coaching, mentoring, and developing engineering talent
Strategic planning skills with a track record of aligning technical direction with organizational objectives
Excellent communication skills; able to translate complex technical issues into clear, actionable insights for executive and non‑technical audiences
Highly collaborative, with the ability to work effectively across engineering, product, operations, and business functions
Skilled at navigating conflict and fostering healthy team dynamics
Proactive problem solver who identifies risks and drives innovative solutions
Strong sense of accountability for team outcomes, reliability standards, and operational excellence
Expertise with observability and monitoring platforms such as Datadog, Prometheus, Dynatrace, Grafana, ELK, or similar
Strong proficiency in programming languages such as Python, Go, or Java
Deep understanding of cloud platforms (AWS, Azure, GCP) and container orchestration technologies (Docker, Kubernetes)
Advanced knowledge of AWS services including VPC, Lambda, IAM, ELB, EC2, ECS, CloudWatch, API Gateway, S3, SQS, SNS, WAF, and Route53
Hands-on experience with infrastructure‑as‑code tools such as Terraform, Ansible, or equivalents
Expert troubleshooting and problem-solving skills across distributed systems
Strong leadership and communication skills with a proven ability to work cross-functionally
Demonstrated success leading and mentoring engineering teams
Strong understanding of security best practices, compliance frameworks, and implementation of security controls
Experience with chaos engineering, resilience testing, and failure-injection methodologies
Familiarity with applying AI/ML approaches to reliability, operations, and incident management
Benefits
Medical, dental, and vision insurance
401(k) with company match
Paid time off (PTO)
And additional employee wellness programs
Company
Insulet Corporation
Insulet Corporation (NASDAQ: PODD), headquartered in Massachusetts, is an innovative medical device company dedicated to simplifying life for people with diabetes and other conditions through its Omnipod product platform.
H1B Sponsorship
Insulet Corporation has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (58)
2024 (43)
2023 (19)
2022 (33)
2021 (41)
2020 (17)
Funding
Current Stage
Public CompanyTotal Funding
$629.5MKey Investors
DeerfieldOrbiMedAlta Partners
2025-03-18Post Ipo Debt· $450M
2009-03-16Post Ipo Debt· $60M
2007-05-15IPO
Leadership Team
Recent News
2025-12-20
2025-12-17
Company data provided by crunchbase