Training Performance Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

OpenAI · 1 day ago

Training Performance Engineer

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. As a Training Performance Engineer, you will drive efficiency improvements across the distributed training stack, analyze large-scale training runs, and design optimizations to enhance throughput and uptime for model training.

Agentic AIArtificial Intelligence (AI)Foundational AIGenerative AIMachine LearningNatural Language ProcessingSaaS
check
Growth Opportunities
check
H1B Sponsor Likelynote

Responsibilities

Profile end-to-end training runs to identify performance bottlenecks across compute, communication, and storage
Optimize GPU utilization and throughput for large-scale distributed model training
Collaborate with runtime and systems engineers to improve kernel efficiency, scheduling, and collective communication performance
Implement model graph transforms to improve end to end throughput
Build tooling to monitor and visualize MFU, throughput, and uptime across clusters
Partner with researchers to ensure new model architectures scale efficiently during pre-training
Contribute to infrastructure decisions that improve reliability and efficiency of large training jobs

Qualification

PythonC++Distributed trainingGPU optimizationPerformance engineeringPyTorchJAXTensorFlowCUDANCCLMPIUCXDebuggingCollaboration

Required

Strong programming skills in Python and C++ (Rust or CUDA a plus)
Experience running distributed training jobs on multi-GPU systems or HPC clusters
Enjoy debugging complex distributed systems and measuring efficiency rigorously
Exposure to frameworks like PyTorch, JAX, or TensorFlow and an understanding of how large-scale training loops are built
Comfortable collaborating across teams and translating raw profiling data into practical engineering improvements

Preferred

Familiarity with NCCL, MPI, or UCX communication libraries
Experience with large-scale data loading and checkpointing systems
Prior work on training runtime, distributed scheduling, or ML compiler optimization

Benefits

Relocation assistance to new employees

Company

OpenAI is an AI research and deployment company that develops advanced AI models, including ChatGPT. It is a sub-organization of OpenAI Foundation.

H1B Sponsorship

OpenAI has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1)
2024 (1)
2023 (1)
2022 (18)
2021 (10)
2020 (6)

Funding

Current Stage
Growth Stage
Total Funding
$79B
Key Investors
The Walt Disney CompanySoftBankThrive Capital
2025-12-11Corporate Round· $1B
2025-10-02Secondary Market· $6.6B
2025-03-31Series Unknown· $40B

Leadership Team

leader-logo
Sam Altman
CEO & Co-Founder
leader-logo
Greg Brockman
President, Chairman, & Co-Founder
linkedin

Recent News

Company data provided by crunchbase