CVS Health · 2 months ago
Staff Observability Operations Engineer
CVS Health is a premier health innovation company dedicated to transforming health care. They are seeking a Staff Observability Operations Engineer to oversee and optimize their observability platform, ensuring seamless operations through deployment, management, and troubleshooting of observability solutions.
Health CareMedicalPharmaceuticalRetailSales
Responsibilities
Deploy and implement modern observability solutions to meet organizational needs
Ensure successful integration of observability, event management, and notification tools and technologies within the existing environment
Work with partners to migrate legacy monitoring to modern solutions
Work with the observability engineering team to provide solutions for new requirements that arise, by leveraging existing or developing new solutions
Manage and administer observability and event management platforms
Lead system upgrades, patching, and maintenance activities to ensure optimal performance and security
Coordinate and manage release cycles for observability platforms
Ensure smooth and timely releases with minimal disruption to services
Troubleshoot and resolve incidents related to observability platforms
Manage escalated customer issues and requests, ensuring timely and effective resolution
Document incident remediation activities to enable resolution by L1/L2 MSP partners; automate remediation activities where possible
Continuously monitor and enhance platform performance to support scalability and complexity
Utilize telemetry data to automate performance optimization and capacity planning
Collaborate with cross-functional infrastructure, application, and business stakeholders to ensure observability solutions align with the broader IT strategy and infrastructure requirements
Communicate effectively with team members, management, and other stakeholders
Identify opportunities for process optimization and efficiency gains
Stay current with industry trends and best practices to continuously improve observability operations
Ensure high levels of customer satisfaction by effectively managing customer relationships
Provide excellent customer service and support for observability solutions
Ensure observability platforms comply with organizational policies and security standards
Implement tools and processes to detect and remediate configuration drifts and security risks
Maintain comprehensive documentation of observability platform configurations, processes, and procedures
Generate and analyze reports on platform performance and capacity
Provide training and mentoring to junior engineers, team members, and our MSPs
Share knowledge and best practices to enhance the overall capability of the team
Qualification
Required
7+ Years of experience in IT operations, with significant responsibilities in system monitoring, performance tuning, and troubleshooting enterprise applications
5+ Years in a Site Reliability Engineering (SRE) role deploying and managing modern observability solutions
5+ Years managing and implementing observability and event management platforms (e.g., AppDynamics, Splunk, Prometheus, Grafana)
Experience developing and administering ServiceNow ITOM event management solutions, ensuring seamless integration with observability tools
Experience deploying and managing service reliability platforms (e.g., xMatters, OpsGenie, PagerDuty), configuring incident notifications, incident command workflows, and automating incident remediation workflows
Experience with and deep knowledge of cloud environments, cloud monitoring platforms, and container orchestration tools (e.g., AWS/CloudTrail, Azure/Monitor, GCP/GCM, Kubernetes, OpenShift)
Proficiency in Python and other scripting languages such as Ansible, PowerShell, and Bash for automation and configuration. Experience with and passion for deploying things 'as code'
Hands-on experience deploying, managing, and administering observability platforms
Hands-on experience leading, coordinating, and performing migration of application, platform, and infrastructure observability solutions (e.g., full-stack APM, RUM, Session Replay, Server, Storage, Network, Database, NLB, etc.) from legacy tools to modern platforms
Hands on experience performing system upgrades, patching, and integrations to ensure platform stability and security
Experience developing and implementing monitoring and logging standards for infrastructure, platforms, and applications
Experience building and instrumenting dashboards to deliver technical and business process insights leveraging standard observability/BI platforms (e.g., AppDynamics, Grafana, Tableau, PowerBI)
Experience establishing and implementing event correlation policies and related rules to enrich event data, increase signal-to-noise-ratio for events, and reduce MTTD and MTTR
Excellent problem-solving skills, with the ability to handle multiple tasks, prioritize effectively, and work under pressure
Proven ability to troubleshoot and resolve complex technical issues related to observability platforms
Experience managing customer issues and requests, providing timely and effective solutions
Experience monitoring platform performance and implementing enhancements to support scalability and complexity
Experience leveraging telemetry data to automate performance optimization and capacity planning
Proficiency in scripting and programming languages such as Ansible, PowerShell, Bash, Python, YAML, XML, and JSON to automate deployment, configuration and instrumentation
Experience coordinating and managing release cycles for observability platforms
Knowledge of best practices in release management to ensure smooth and timely deployments
Experience configuring and leveraging source code management tools and workflows to manage and deploy Monitoring as Code
Excellent communication skills, both verbal and written
Ability to collaborate effectively with cross-functional teams and stakeholders
Strong interpersonal skills, with the ability to engage effectively with both technical teams and business stakeholders
Commitment to continuous improvement and staying current with industry trends and best practices
Ability to identify opportunities for process optimization and efficiency gains
Strong customer service orientation with the ability to manage customer relationships effectively
Experience in providing excellent customer service and support for observability solutions
Knowledge of compliance and security standards related to observability platforms
Ability to implement tools and processes to detect and remediate configuration drift and security risks
Experience managing operational data and systems access to ensure compliance with internal and external audit and regulatory requirements
Proficiency maintaining comprehensive documentation of observability platform configurations, processes, and procedures
Ability to generate and analyze reports on platform performance, incidents, and customer requests
Preferred
ITIL 4 Practitioner: Monitoring and Event Management
DevOps Institute Observability Foundation
DevOps Institute Site Reliability Engineering Foundation or Practitioner
ServiceNow CIS-Event Management Implementer
ServiceNow Certified Application Developer
xMatters Integrator
Benefits
Affordable medical plan options, a 401(k) plan (including matching company contributions), and an employee stock purchase plan.
No-cost programs for all colleagues including wellness screenings, tobacco cessation and weight management programs, confidential counseling and financial coaching.
Benefit solutions that address the different needs and preferences of our colleagues including paid time off, flexible work schedules, family leave, dependent care resources, colleague assistance programs, tuition assistance, retiree medical access and many other benefits depending on eligibility.
Company
CVS Health
CVS Health is a health solutions company that provides an integrated healthcare services to its members.
H1B Sponsorship
CVS Health has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2022 (1)
Funding
Current Stage
Public CompanyTotal Funding
$4BKey Investors
Michigan Economic Development CorporationStarboard Value
2025-08-15Post Ipo Debt· $4B
2025-07-17Grant· $1.5M
2019-11-25Post Ipo Equity
Leadership Team
Recent News
Hartford Business Journal
2026-01-11
Digital Commerce 360
2026-01-07
2026-01-07
Company data provided by crunchbase