OpenAI · 10 hours ago
Software Engineer, Infrastructure Reliability
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. The role involves designing, building, and operating reliable systems that support high-impact teams, ensuring systems are secure, observable, and performant for products like ChatGPT and the OpenAI API.
Artificial Intelligence (AI)Generative AIMachine LearningNatural Language ProcessingSaaS
Responsibilities
Design, build, and operate reliable and performant systems used across engineering
Identify and fix performance bottlenecks and inefficiencies, ensuring our infrastructure can scale to the next order of magnitude
Dig deep to resolve complex issues
Continuously improve automation to reduce manual work. Improve internal tooling and our developer experience
Contribute to incident response, postmortems, and the development of best practices around system reliability and scalability
Qualification
Required
4+ years of relevant industry experience, with 2+ years leading large scale, complex projects or teams as an engineer or tech lead
A passion for distributed systems at scale with a focus on reliability, scalability, security, and continuous improvement
Proven experience as an reliability engineer, production engineer, or a similar role in a fast-paced, rapidly scaling company
Strong proficiency in cloud infrastructure (like AWS, GCP, Azure) and IaC tools such as Terraform. Proficiency in programming / scripting languages
Experience with containerization technologies and container orchestration platforms like Kubernetes
Experience with observability tools such as Datadog, Prometheus, Grafana, Splunk and ELK stack
Experience with microservices architecture and service mesh technologies
Knowledge of security best practices in cloud environments
Strong understanding of distributed systems, networking, and database technologies
Excellent problem-solving skills and ability to work in a fast-paced environment
Preferred
Have a deep understanding of distributed systems principles and a proven track record in building and operating scalable and reliable systems
Have a keen eye for performance and optimization. You know how to squeeze the most performance out of complex, globally-distributed systems
Have experience operating orchestration systems such as Kubernetes at scale and building abstractions over cloud platforms
Are comfortable working in Linux environments, and with tools like Kubernetes, Terraform, CI/CD pipelines, and modern observability stacks
Are experienced in collaborating with cross-functional teams to ensure that reliability and scalability are considered in the design and development of new features and services
Have a humble attitude, an eagerness to help your colleagues, and a desire to do whatever it takes to make the team succeed
Own problems end-to-end, and are willing to pick up whatever knowledge you're missing to get the job done
Are comfortable with ambiguity and rapid change
Company
OpenAI
OpenAI is an AI research and deployment company that develops advanced AI models, including ChatGPT. It is a sub-organization of OpenAI Foundation.
H1B Sponsorship
OpenAI has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1)
2024 (1)
2023 (1)
2022 (18)
2021 (10)
2020 (6)
Funding
Current Stage
Growth StageTotal Funding
$79BKey Investors
The Walt Disney CompanySoftBankThrive Capital
2025-12-11Corporate Round· $1B
2025-10-02Secondary Market· $6.6B
2025-03-31Series Unknown· $40B
Recent News
2025-12-31
2025-12-31
Android Authority
2025-12-31
Company data provided by crunchbase