Capgemini · 2 hours ago
Metrics Platform Site Reliability Engineer
Capgemini is a global business and technology transformation partner, and they are seeking a Metrics Platform Site Reliability Engineer. This role involves managing a team of Site Reliability Engineers, implementing SRE strategies, and ensuring system reliability and performance.
ConsultingInformation TechnologyInsurTechIT ManagementSoftware
Responsibilities
Manage and mentor a team of Site Reliability Engineers
Define and implement SRE strategies and best practices in alignment with organizational objectives
Monitor clients service level agreements SLAs service level objectives SLOs and service level indicators SLIs
Lead initiatives to improve system reliability availability scalability and performance
Collaborate with development and operations teams to ensure reliability and resiliency goals are met
Implement and improve incident management processes to minimize downtime and ensure timely resolutions
Review and contribute to the architecture of critical systems ensuring they meet reliability and performance goals
Drive observability practices by implementing robust monitoring logging and alerting systems
Implement and maintain monitoring systems to proactively identify potential issues and alert engineers to problems before they impact users
Respond to incidents and outages diagnose problems and implement solutions to minimize downtime and restore service
Automate repetitive tasks and processes to improve efficiency and reduce manual effort
Identify and address performance bottlenecks to ensure systems run efficiently and effectively
Manage and maintain the underlying infrastructure including servers networks and cloud resources
Plan for future capacity needs to ensure systems can handle anticipated workloads
Develop and maintain processes for deploying software updates and releases
Work closely with developers operations teams and other stakeholders to ensure system reliability and availability
Maintain clear and concise documentation of systems processes and procedures
Identify areas for improvement and implement changes to enhance system reliability and performance
Qualification
Required
Proficiency in writing Splunk Queries and Alerts is a must
Hands on experience with at least one APM tool NewRelic AppDynamics Honeycomb Data Dog is a must
Expertise in automation tools and scripting languages Python Or JavaScript is a must
Proficiency in scripting languages Python or NodeJs a must
Proficiency in any cloud platforms AWS GCP Azure is a must
Strong understanding of distributed systems microservices architecture and container orchestration tools eg Kubernetes
Experience with monitoring tools like Prometheus Grafana a must
Benefits
Flexible work
Healthcare including dental, vision, mental health, and well-being programs
Financial well-being programs such as 401(k) and Employee Share Ownership Plan
Paid time off and paid holidays
Paid parental leave
Family building benefits like adoption assistance, surrogacy, and cryopreservation
Social well-being benefits like subsidized back-up child/elder care and tutoring
Mentoring, coaching and learning programs
Employee Resource Groups
Disaster Relief
Company
Capgemini
Capgemini is a software company that provides consulting, technology, and digital transformation services.
H1B Sponsorship
Capgemini has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (2856)
2024 (3012)
2023 (3424)
2022 (4392)
2021 (3311)
2020 (5871)
Funding
Current Stage
Public CompanyTotal Funding
$4.72B2025-09-18Post Ipo Debt· $4.72B
1999-04-01IPO
Recent News
Techcircle
2025-12-31
2025-12-30
2025-12-30
Company data provided by crunchbase