Apply on Employer Site

Replit · 2 months ago

Site Reliability Engineer

Foster City, CA

Full-time

Hybrid

Mid, Senior Level

$160K/yr - $250K/yr

4+ years exp

Replit is a software creation platform that enables users to build applications using natural language. The Site Reliability Engineer will ensure the reliability, scalability, and performance of Replit's infrastructure, implementing automation and best practices while designing robust monitoring solutions and leading incident response efforts.

Artificial Intelligence (AI)Cloud ComputingDeveloper ToolsInformation TechnologySoftware

Growth Opportunities

H1B Sponsor Likely

Responsibilities

Design and Implement Observability Solutions: Develop comprehensive monitoring and alerting systems using modern observability tools. Create dashboards and metrics that provide real-time visibility into system health and performance. Implement logging strategies that enable quick problem identification and resolution

Drive Automation and Infrastructure as Code: Architect and implement infrastructure automation solutions using tools like Terraform, Ansible, or Pulumi. Design and maintain CI/CD pipelines that enable reliable and consistent deployments. Create self-healing systems that can automatically respond to common failure scenarios

Establish SLOs and SLIs: Work with product and engineering teams to define and implement Service Level Objectives (SLOs) and Service Level Indicators (SLIs). Build systems to track and report on these metrics, ensuring we maintain high reliability standards while balancing innovation speed

Incident Management and Response: Lead incident response efforts, conducting thorough post-mortems, and implementing improvements to prevent future occurrences. Develop and maintain runbooks for critical services. Build tools and processes that reduce Mean Time To Recovery (MTTR)

Performance Optimization: Identify and resolve performance bottlenecks across our infrastructure. Implement capacity planning strategies and optimize resource utilization. Work on reducing latency and improving system efficiency across global regions

Qualification

Site Reliability EngineeringInfrastructure as CodeMonitoring SolutionsContainer OrchestrationDistributed SystemsIncident ManagementCloud TechnologiesPythonGoTerraformAnsiblePrometheusGrafanaDatadog

Required

4-8 years of experience in Site Reliability Engineering or similar roles (DevOps, Systems Engineering, Infrastructure Engineering)

Strong programming skills in languages commonly used for automation (Python, Go, or similar)

Deep understanding of distributed systems

Experience with container orchestration platforms (Kubernetes) and cloud-native technologies

Proven track record of implementing and maintaining monitoring/observability solutions

Strong incident management skills with experience leading incident response

Experience with infrastructure as code and configuration management tools

Preferred

Experience with Google Cloud Platform (GCP) services and tools

Knowledge of modern observability platforms (Prometheus, Grafana, Datadog, etc.)

Benefits

401(k) Program

Health, Dental, Vision and Life Insurance

Short Term and Long Term Disability

Paid Parental, Medical, Caregiver Leave

Commuter Benefits

Monthly Wellness Stipend

Autonoumous Work Environement

In Office Set-Up Reimbursement

Flexible Time Off (FTO) + Holidays

Quarterly Team Gatherings

In Office Amenities

Company

Replit

Replit is the most secure agentic platform for production-ready apps.

Founded in 2016

Foster City, California, USA

51-200 employees

https://replit.com

H1B Sponsorship

Replit has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (8)

2024 (5)

2023 (2)

2022 (2)

Funding

Current Stage

Growth Stage

Total Funding

$472.02M

Key Investors

Prysm CapitalCraft VenturesAndreessen Horowitz

2025-07-30Series C· $250M

2023-11-06Series B· $20M

2023-04-25Series B· $97.4M

Leadership Team

Amjad Masad

CEO

Haya Odeh

Co-Founder

Recent News

Benzinga.com

Replit CEO Says 'Anyone Who Has Ideas Should Potentially Be Wealthy…That's The True Promise Of Capitalism'

2026-01-25

WebProNews

Replit’s AI Vibe Coding Revolutionizes iOS App Development by 2026

2026-01-19

36kr.com

After laying off 50% of the staff, he achieved a remarkable comeback through AI programming. His Annual Recurring Revenue (ARR) exceeded 100 million, and the company's valuation soared to 60 billion.

2026-01-17

Company data provided by crunchbase