Software Engineer, Infrastructure Reliability jobs in United States
cer-icon
Apply on Employer Site
company-logo

OpenAI · 10 hours ago

Software Engineer, Infrastructure Reliability

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. The role involves designing, building, and operating reliable systems that support high-impact teams, ensuring systems are secure, observable, and performant for products like ChatGPT and the OpenAI API.

Artificial Intelligence (AI)Generative AIMachine LearningNatural Language ProcessingSaaS
check
Growth Opportunities
check
H1B Sponsor Likelynote

Responsibilities

Design, build, and operate reliable and performant systems used across engineering
Identify and fix performance bottlenecks and inefficiencies, ensuring our infrastructure can scale to the next order of magnitude
Dig deep to resolve complex issues
Continuously improve automation to reduce manual work. Improve internal tooling and our developer experience
Contribute to incident response, postmortems, and the development of best practices around system reliability and scalability

Qualification

Distributed systemsCloud infrastructureKubernetesObservability toolsMicroservices architectureSecurity best practicesProblem-solving skillsAutomationPerformance optimizationCollaboration

Required

4+ years of relevant industry experience, with 2+ years leading large scale, complex projects or teams as an engineer or tech lead
A passion for distributed systems at scale with a focus on reliability, scalability, security, and continuous improvement
Proven experience as an reliability engineer, production engineer, or a similar role in a fast-paced, rapidly scaling company
Strong proficiency in cloud infrastructure (like AWS, GCP, Azure) and IaC tools such as Terraform. Proficiency in programming / scripting languages
Experience with containerization technologies and container orchestration platforms like Kubernetes
Experience with observability tools such as Datadog, Prometheus, Grafana, Splunk and ELK stack
Experience with microservices architecture and service mesh technologies
Knowledge of security best practices in cloud environments
Strong understanding of distributed systems, networking, and database technologies
Excellent problem-solving skills and ability to work in a fast-paced environment

Preferred

Have a deep understanding of distributed systems principles and a proven track record in building and operating scalable and reliable systems
Have a keen eye for performance and optimization. You know how to squeeze the most performance out of complex, globally-distributed systems
Have experience operating orchestration systems such as Kubernetes at scale and building abstractions over cloud platforms
Are comfortable working in Linux environments, and with tools like Kubernetes, Terraform, CI/CD pipelines, and modern observability stacks
Are experienced in collaborating with cross-functional teams to ensure that reliability and scalability are considered in the design and development of new features and services
Have a humble attitude, an eagerness to help your colleagues, and a desire to do whatever it takes to make the team succeed
Own problems end-to-end, and are willing to pick up whatever knowledge you're missing to get the job done
Are comfortable with ambiguity and rapid change

Company

OpenAI is an AI research and deployment company that develops advanced AI models, including ChatGPT. It is a sub-organization of OpenAI Foundation.

H1B Sponsorship

OpenAI has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1)
2024 (1)
2023 (1)
2022 (18)
2021 (10)
2020 (6)

Funding

Current Stage
Growth Stage
Total Funding
$79B
Key Investors
The Walt Disney CompanySoftBankThrive Capital
2025-12-11Corporate Round· $1B
2025-10-02Secondary Market· $6.6B
2025-03-31Series Unknown· $40B

Leadership Team

leader-logo
Sam Altman
CEO & Co-Founder
leader-logo
Greg Brockman
President, Chairman, & Co-Founder
linkedin
Company data provided by crunchbase