Celonis · 1 day ago
Staff Software Engineer - Site Reliability
Celonis is the global leader in Process Intelligence technology and one of the fastest-growing SaaS firms. As a member of the Reliability Engineering team, you will ensure the health, performance, and resilience of the platform by applying advanced software engineering and SRE principles to drive system reliability and scalability.
AnalyticsArtificial Intelligence (AI)Big DataBusiness IntelligenceBusiness Process Automation (BPA)SaaS
Responsibilities
Lead reliability efforts for a fleet of 80+ FedRAMP-compliant microservices running on Kubernetes, applying SRE principles to drive observability, automation, and incident prevention
Own high-priority application incident escalations, performing deep technical analysis and restoration within defined SLOs, while continuously improving detection and response mechanisms
Engineer solutions to enhance the availability, latency, and performance of production services—automating manual processes to eliminate toil and scale operational efficiency
Collaborate closely with platform and application engineering teams to conduct post-incident reviews, extract insights, and implement systemic changes that improve overall reliability
Document operational knowledge and runbooks, embedding SRE best practices into onboarding, incident response, and platform architecture standards
Qualification
Required
Bachelor's or Master's degree in Computer Science, Software Engineering, or a related technical field (or equivalent hands-on experience)
Minimum of 5 years of experience building and maintaining cloud-based software applications with at least one public cloud platform (AWS, Azure, or GCP)
Proficiency in Java, the Spring framework, and Python (or a similar scripting language) in a Linux environment
Prior experience contributing to Site Reliability Engineering initiatives or similar operational roles
Knowledge of SRE principles, including SLI/SLO design, error budgets, and toil reduction strategies
Proven expertise in developing and operating production-grade, scalable services using Kubernetes and elastic cloud architectures
Strong problem-solving and troubleshooting abilities in complex, distributed systems
Excellent written and verbal communication skills in English
Preferred
Familiarity with observability and monitoring tools (e.g., Datadog, etc.)
Experience with CI/CD pipelines and tools such as ArgoCD, GitHub Actions, or similar
Experience with Infrastructure as Code (IaC) tools such as Terraform and Kustomize
Exposure to incident management practices, on-call rotations, and postmortem culture
Benefits
Generous PTO
Hybrid working options
Company equity (RSUs)
Comprehensive benefits
Extensive parental leave
Dedicated volunteer days
Gym subsidies
Counseling
Well-being programs
Company
Celonis
Celonis provides an execution management system that helps companies in running their business processes.
Funding
Current Stage
Late StageTotal Funding
$2.37BKey Investors
Qatar Investment AuthorityKeyBanc Capital MarketsArena Holdings
2023-07-15Secondary Market
2022-08-23Series D· $400M
2022-08-23Debt Financing· $600M
Recent News
2025-12-13
Company data provided by crunchbase