Staff Infrastructure Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Replit · 2 months ago

Staff Infrastructure Engineer

Replit is a software creation platform that democratizes application development for millions of users globally. They are looking for a Staff Infrastructure Engineer to ensure the reliability and scalability of their infrastructure, implementing automation and best practices to enhance performance and availability.

Artificial Intelligence (AI)Cloud ComputingDeveloper ToolsInformation TechnologySoftware
check
Growth Opportunities
check
H1B Sponsor Likelynote

Responsibilities

Drive Automation and Infrastructure as Code: Architect, build, and improve automation to eliminate toil and operational work. Design and maintain CI/CD pipelines and infrastructure automation using tools like Terraform or Pulumi. Create self-healing systems that can automatically respond to common failure scenarios
Optimize Performance and Infrastructure: Collaborate with core infrastructure and product teams to performance tune and optimize our cloud deployments (Kubernetes, Docker, GCP). Identify and resolve performance bottlenecks, implement capacity planning strategies, and reduce latency across global regions
Elevate Developer Experience: Design and implement improvements to our build, test, and deployment systems to make software delivery faster, safer, and more reliable for all engineers
Drive Cross-Company Improvements: Partner directly with service owners across Replit to understand their pain points, and collaborate on implementing build/test/deploy enhancements within their specific services
Build Shared Tooling: Create and maintain centralized tooling and automation that improves the entire engineering lifecycle, from local development to production monitoring
Debug and Harden Systems: Dive deep into debugging extremely difficult technical problems, making our systems and products more robust, operable, and easier to diagnose
Provide Staff-Level Guidance: Review feature and system designs, acting as an owner for the security, scale, and operational integrity of those designs
Educate and Mentor: Educate, mentor, and hold accountable the engineering team to improve the reliability of our systems, making reliability a core value of the Replit engineering culture
Build and Integrate: Write high-quality, well-tested code to meet the needs of your customers, including building pipelines to integrate with 3rd party vendors

Qualification

Infrastructure EngineeringPythonGoKubernetesTerraformDistributed SystemsMonitoring SolutionsIncident ManagementPerformance TuningConfiguration ManagementDebugging SkillsPassion for Software CreationCommunication SkillsInterpersonal SkillsMentoring

Required

8-10 years of experience in Infrastructure Engineering or similar roles (DevOps, Systems Engineering, Site Reliability Engineering)
Strong programming skills in languages like Python or Go
You write high-quality, well-tested code
Deep understanding of distributed systems. You've designed, built, scaled, and maintained production services and know how to compose a service-oriented architecture
Experience with container orchestration platforms (Kubernetes) and cloud-native technologies
Proven track record of implementing and maintaining monitoring/observability solutions, with strong skills in debugging and performance tuning
Strong incident management skills with experience leading incident response and demonstrated critical thinking under pressure
Experience with infrastructure as code (e.g., Terraform) and configuration management tools
Excellent written and verbal communication skills, with an ability to explain technical concepts clearly and simply and a bias toward open, transparent cultural practices
Strong interpersonal skills, with experience working with engineers from junior to principal levels
A willingness to dive into understanding, debugging, and improving any layer of the stack
You're passionate about making software creation accessible and empowering the next generation of builders

Preferred

Deep experience with Google Cloud Platform (GCP) services and tools
Knowledge of modern observability platforms (Prometheus, Grafana, Datadog, etc.)
Experience designing and building reliable systems capable of handling high throughput and low latency
Experience with Go and Terraform
Familiarity with working in rapid-growth environments
Experience writing company-facing blog posts and training materials

Benefits

401(k) Program
Health, Dental, Vision and Life Insurance
Short Term and Long Term Disability
Paid Parental, Medical, Caregiver Leave
Commuter Benefits
Monthly Wellness Stipend
Autonoumous Work Environement
In Office Set-Up Reimbursement
Flexible Time Off (FTO) + Holidays
Quarterly Team Gatherings
In Office Amenities

Company

Replit

twittertwittertwitter
company-logo
Replit is the most secure agentic platform for production-ready apps.

H1B Sponsorship

Replit has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (8)
2024 (5)
2023 (2)
2022 (2)

Funding

Current Stage
Growth Stage
Total Funding
$472.02M
Key Investors
Prysm CapitalCraft VenturesAndreessen Horowitz
2025-07-30Series C· $250M
2023-11-06Series B· $20M
2023-04-25Series B· $97.4M

Leadership Team

leader-logo
Amjad Masad
CEO
linkedin
leader-logo
Haya Odeh
Co-Founder
linkedin
Company data provided by crunchbase