OpenAI · 1 day ago
Training Performance Engineer
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. As a Training Performance Engineer, you will drive efficiency improvements across the distributed training stack, analyze large-scale training runs, and design optimizations to enhance throughput and uptime for model training.
Agentic AIArtificial Intelligence (AI)Foundational AIGenerative AIMachine LearningNatural Language ProcessingSaaS
Responsibilities
Profile end-to-end training runs to identify performance bottlenecks across compute, communication, and storage
Optimize GPU utilization and throughput for large-scale distributed model training
Collaborate with runtime and systems engineers to improve kernel efficiency, scheduling, and collective communication performance
Implement model graph transforms to improve end to end throughput
Build tooling to monitor and visualize MFU, throughput, and uptime across clusters
Partner with researchers to ensure new model architectures scale efficiently during pre-training
Contribute to infrastructure decisions that improve reliability and efficiency of large training jobs
Qualification
Required
Strong programming skills in Python and C++ (Rust or CUDA a plus)
Experience running distributed training jobs on multi-GPU systems or HPC clusters
Enjoy debugging complex distributed systems and measuring efficiency rigorously
Exposure to frameworks like PyTorch, JAX, or TensorFlow and an understanding of how large-scale training loops are built
Comfortable collaborating across teams and translating raw profiling data into practical engineering improvements
Preferred
Familiarity with NCCL, MPI, or UCX communication libraries
Experience with large-scale data loading and checkpointing systems
Prior work on training runtime, distributed scheduling, or ML compiler optimization
Benefits
Relocation assistance to new employees
Company
OpenAI
OpenAI is an AI research and deployment company that develops advanced AI models, including ChatGPT. It is a sub-organization of OpenAI Foundation.
H1B Sponsorship
OpenAI has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1)
2024 (1)
2023 (1)
2022 (18)
2021 (10)
2020 (6)
Funding
Current Stage
Growth StageTotal Funding
$79BKey Investors
The Walt Disney CompanySoftBankThrive Capital
2025-12-11Corporate Round· $1B
2025-10-02Secondary Market· $6.6B
2025-03-31Series Unknown· $40B
Recent News
Inc42 Media
2026-01-12
Pulse 2.0
2026-01-12
Business Insider
2026-01-12
Company data provided by crunchbase