Staff SRE (Reliability Engineering) jobs in United States
cer-icon
Apply on Employer Site
company-logo

Celonis · 3 days ago

Staff SRE (Reliability Engineering)

Celonis is the global leader in Process Intelligence technology and one of the world's fastest-growing SaaS firms. The Staff SRE will play a critical role in ensuring the health, performance, and resilience of the platform by applying advanced software engineering and Site Reliability Engineering principles to drive system reliability and operational excellence.

AnalyticsArtificial Intelligence (AI)Big DataBusiness IntelligenceBusiness Process Automation (BPA)SaaS
badNo H1Bnote

Responsibilities

Join a highly technical, collaborative, and innovation-driven team that blends Site Reliability Engineering with modern Software Engineering practices to build resilient and scalable systems
Lead reliability efforts for a fleet of 80+ FedRAMP-compliant microservices running on Kubernetes, applying SRE principles to drive observability, automation, and incident prevention
Develop and enforce SLOs, SLAs, and error budgets to drive reliability-focused development
Provide mentorship and technical leadership across the SRE and engineering teams
Own high-priority application incident escalations, performing deep technical analysis and restoration within defined SLOs, while continuously improving detection and response mechanisms
Engineer solutions to enhance the availability, latency, and performance of production services—automating manual processes to eliminate toil and scale operational efficiency
Collaborate closely with platform and application engineering teams to conduct post-incident reviews, extract insights, and implement systemic changes that improve overall reliability

Qualification

Site Reliability EngineeringCloud platforms AWSCloud platforms GCPCloud platforms AzureJavaSpring frameworkPythonKubernetesCI/CD pipelinesInfrastructure as Code (IaC)Incident managementCommunication skillsMentorship

Required

Bachelor's or Master's degree in Computer Science, Software Engineering, or a related technical field (or equivalent hands-on experience)
Minimum of 8+ years of experience in software engineering or SRE roles
Deep experience with cloud platforms (AWS, GCP, or Azure)
Proficiency in Java, the Spring framework, and Python (or a similar scripting language) in a Linux environment
Prior experience contributing to Site Reliability Engineering initiatives or similar operational roles
Demonstrated ability to lead projects and influence engineering culture
Knowledge of SRE principles, including SLI/SLO design, error budgets, and toil reduction strategies
Excellent written and verbal communication skills in English
Please note: This position is not eligible for immigration visa sponsorship, now or in the future

Preferred

Experience with observability and monitoring tools (e.g., Datadog, etc.)
Experience in developing and operating production-grade, scalable services using Kubernetes and elastic cloud architectures
Experience with CI/CD pipelines and tools such as ArgoCD, GitHub Actions, or similar
Experience with Infrastructure as Code (IaC) tools such as Terraform and Kustomize
Exposure to incident management practices, on-call rotations, and postmortem culture

Benefits

Generous PTO
Hybrid working options
Company equity (RSUs)
Comprehensive benefits
Extensive parental leave
Dedicated volunteer days

Company

Celonis provides an execution management system that helps companies in running their business processes.

Funding

Current Stage
Late Stage
Total Funding
$2.37B
Key Investors
Qatar Investment AuthorityKeyBanc Capital MarketsArena Holdings
2023-07-15Secondary Market
2022-08-23Series D· $400M
2022-08-23Debt Financing· $600M

Leadership Team

leader-logo
Alexander Rinke
Co-CEO
linkedin
leader-logo
Bastian Nominacher
Co-CEO / Co-Founder
linkedin
Company data provided by crunchbase