Software Engineer, Fleet Hardware Health jobs in United States
cer-icon
Apply on Employer Site
company-logo

OpenAI · 6 months ago

Software Engineer, Fleet Hardware Health

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. As a software engineer on the Fleet Hardware team, you will be responsible for the reliability and uptime of OpenAI’s compute fleet, developing automation systems and tools to monitor server health and performance. This role requires a focus on system-level investigations and the development of automated solutions to maintain the health and efficiency of supercomputing infrastructure.

Agentic AIArtificial Intelligence (AI)Foundational AIGenerative AIMachine LearningNatural Language ProcessingSaaS
check
Growth Opportunities
check
H1B Sponsor Likelynote

Responsibilities

Build and maintain automation systems for provisioning and managing server fleets
Develop tools to monitor server health, performance, and lifecycle events
Collaborate with clusters, networking, and infrastructure teams
Partner with external operators to ensure a high level of quality
Identify and fix performance bottlenecks and inefficiencies
Continuously improve automation to reduce manual work

Qualification

Large-scale server managementPythonGo proficiencyLinux knowledgeNetworking knowledgeSQL proficiencyMonitoring tools familiarityAutomation developmentPerformance optimizationCollaboration skills

Required

Experience managing large-scale server environments
A balance of strengths in building and operationalizing
Proficiency in Python, Go, or similar languages
Strong Linux, networking, and server hardware knowledge
Comfort digging into noisy data with SQL, PromQL, and Pandas or any other tool

Preferred

Experience with low level details of hardware components, protocols, and associated Linux tooling (e.g., PCIe, Infiniband, networking, power management, kernel perf tuning)
Knowledge of hardware management protocols (e.g., IPMI, Redfish)
High-performance computing (HPC) or distributed systems experience
Prior experience developing, managing, or designing hardware
Familiarity with monitoring tools (e.g., Prometheus, Grafana)

Company

OpenAI is an AI research and deployment company that develops advanced AI models, including ChatGPT. It is a sub-organization of OpenAI Foundation.

H1B Sponsorship

OpenAI has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1)
2024 (1)
2023 (1)
2022 (18)
2021 (10)
2020 (6)

Funding

Current Stage
Growth Stage
Total Funding
$79B
Key Investors
The Walt Disney CompanySoftBankThrive Capital
2025-12-11Corporate Round· $1B
2025-10-02Secondary Market· $6.6B
2025-03-31Series Unknown· $40B

Leadership Team

leader-logo
Sam Altman
CEO & Co-Founder
leader-logo
Greg Brockman
President, Chairman, & Co-Founder
linkedin
Company data provided by crunchbase