Blue Shield of California · 1 month ago
Site Reliability Engineer, Consultant
Blue Shield of California is seeking an Experienced Site Reliability Engineer (SRE) to lead reliability, scalability, and performance initiatives across their production systems. The role involves designing, implementing, and maintaining reliable systems that support millions of requests daily, with a focus on automation and incident management.
Financial ServicesHealth InsuranceNon Profit
Responsibilities
Design and maintain systems to achieve high availability (99.9%+), scalability, and resilience
Build and improve monitoring stacks using tools like Prometheus, Grafana, Datadog, or New Relic
Reduce manual toil by automating deployments, scaling, and recovery processes using IaC (Terraform or CloudFormation)
Lead and respond to production incidents, perform root-cause analysis, and drive postmortems and prevention strategies
Identify system bottlenecks and improve performance across compute, network, and database layers
Forecast growth, conduct load testing, and ensure services can handle future demand
Implement best practices for infrastructure security, secrets management, and compliance requirements
Partner with developers to embed reliability practices into the SDLC, CI/CD pipelines, and application architecture
Design and execute chaos testing experiments to proactively identify weaknesses in distributed systems and improve overall resilience
Implement and manage Blue/Green and Canary deployment methodologies to minimize risk and ensure safe, incremental rollouts of new features and updates
Qualification
Required
Requires a Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent practical experience); Master's degree a plus
7+ years of experience in building, supporting, and improving production systems and infrastructure
Minimum 5 years of hands-on experience with Azure, AWS, or GCP
Demonstrated expertise in virtual machines (VMs), containers, cloud networking, identity and access management (IAM), monitoring, storage, and serverless functions
Comfortable deploying and managing cloud-native services and infrastructure
Proficiency in one or more languages such as Python, Go, Java, Bash, PowerShell, or similar
Ability to write clean, maintainable code for automation and tooling
Experience working with Kubernetes, Docker, and tools like Helm or Red Hat OpenShift
Familiarity with managing containerized applications in production environments
Working knowledge of tools such as Prometheus, Grafana, Datadog, New Relic, ELK Stack, Dynatrace, Splunk, Big Panda, SolarWinds
Ability to set up dashboards, alerts, and metrics to ensure system health and performance
Experience with CI/CD pipelines using tools like Jenkins, GitHub Actions, GitLab CI, Argo CD, Spinnaker
Familiarity with configuration management tools such as Ansible, Chef, Puppet
Experience with chaos engineering tools (e.g., Gremlin, Chaos Monkey) and methodologies
Hands-on knowledge of Blue/Green and Canary deployment strategies in cloud-native environments
Preferred
Understanding of Agentic AI systems and automation frameworks for incident response and infrastructure optimization is a plus
Interest in exploring intelligent automation to improve reliability and reduce manual toil
Company
Blue Shield of California
Blue Shield of California is a health insurance service provider.
H1B Sponsorship
Blue Shield of California has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2023 (1)
2022 (41)
2021 (20)
2020 (31)
Funding
Current Stage
Late StageLeadership Team
Recent News
2026-01-13
Fierce Healthcare
2025-12-15
Company data provided by crunchbase