Site Reliability Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Replit · 2 months ago

Site Reliability Engineer

Replit is a software creation platform that enables users to build applications using natural language. The Site Reliability Engineer will ensure the reliability, scalability, and performance of Replit's infrastructure, implementing automation and best practices while designing robust monitoring solutions and leading incident response efforts.

Artificial Intelligence (AI)Cloud ComputingDeveloper ToolsInformation TechnologySoftware
check
Growth Opportunities
check
H1B Sponsor Likelynote

Responsibilities

Design and Implement Observability Solutions: Develop comprehensive monitoring and alerting systems using modern observability tools. Create dashboards and metrics that provide real-time visibility into system health and performance. Implement logging strategies that enable quick problem identification and resolution
Drive Automation and Infrastructure as Code: Architect and implement infrastructure automation solutions using tools like Terraform, Ansible, or Pulumi. Design and maintain CI/CD pipelines that enable reliable and consistent deployments. Create self-healing systems that can automatically respond to common failure scenarios
Establish SLOs and SLIs: Work with product and engineering teams to define and implement Service Level Objectives (SLOs) and Service Level Indicators (SLIs). Build systems to track and report on these metrics, ensuring we maintain high reliability standards while balancing innovation speed
Incident Management and Response: Lead incident response efforts, conducting thorough post-mortems, and implementing improvements to prevent future occurrences. Develop and maintain runbooks for critical services. Build tools and processes that reduce Mean Time To Recovery (MTTR)
Performance Optimization: Identify and resolve performance bottlenecks across our infrastructure. Implement capacity planning strategies and optimize resource utilization. Work on reducing latency and improving system efficiency across global regions

Qualification

Site Reliability EngineeringInfrastructure as CodeMonitoring SolutionsContainer OrchestrationDistributed SystemsIncident ManagementCloud TechnologiesPythonGoTerraformAnsiblePrometheusGrafanaDatadog

Required

4-8 years of experience in Site Reliability Engineering or similar roles (DevOps, Systems Engineering, Infrastructure Engineering)
Strong programming skills in languages commonly used for automation (Python, Go, or similar)
Deep understanding of distributed systems
Experience with container orchestration platforms (Kubernetes) and cloud-native technologies
Proven track record of implementing and maintaining monitoring/observability solutions
Strong incident management skills with experience leading incident response
Experience with infrastructure as code and configuration management tools

Preferred

Experience with Google Cloud Platform (GCP) services and tools
Knowledge of modern observability platforms (Prometheus, Grafana, Datadog, etc.)

Benefits

401(k) Program
Health, Dental, Vision and Life Insurance
Short Term and Long Term Disability
Paid Parental, Medical, Caregiver Leave
Commuter Benefits
Monthly Wellness Stipend
Autonoumous Work Environement
In Office Set-Up Reimbursement
Flexible Time Off (FTO) + Holidays
Quarterly Team Gatherings
In Office Amenities

Company

Replit

twittertwittertwitter
company-logo
Replit is the most secure agentic platform for production-ready apps.

H1B Sponsorship

Replit has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (8)
2024 (5)
2023 (2)
2022 (2)

Funding

Current Stage
Growth Stage
Total Funding
$472.02M
Key Investors
Prysm CapitalCraft VenturesAndreessen Horowitz
2025-07-30Series C· $250M
2023-11-06Series B· $20M
2023-04-25Series B· $97.4M

Leadership Team

leader-logo
Amjad Masad
CEO
linkedin
leader-logo
Haya Odeh
Co-Founder
linkedin
Company data provided by crunchbase