OpenAI · 10 hours ago
Software Engineer, Inference – AMD GPU Enablement
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. The role involves scaling and optimizing OpenAI's inference infrastructure across emerging GPU platforms, focusing on AMD hardware to enhance model performance and execution. Engineers will collaborate across teams to ensure efficient model inference and performance on large GPU clusters.
Agentic AIArtificial Intelligence (AI)Foundational AIGenerative AIMachine LearningNatural Language ProcessingSaaS
Responsibilities
Own bring-up, correctness and performance of the OpenAI inference stack on AMD hardware
Integrate internal model-serving infrastructure (e.g., vLLM, Triton) into a variety of GPU-backed systems
Debug and optimize distributed inference workloads across memory, network, and compute layers
Validate correctness, performance, and scalability of model execution on large GPU clusters
Collaborate with partner teams to design and optimize high-performance GPU kernels for accelerators using HIP, Triton, or other performance-focused frameworks
Collaborate with partner teams to build, integrate and tune collective communication libraries (e.g., RCCL) used to parallelize model execution across many GPUs
Qualification
Required
Have experience writing or porting GPU kernels using HIP, CUDA, or Triton, and care deeply about low-level performance
Are familiar with communication libraries like NCCL/RCCL and understand their role in high-throughput model serving
Have worked on distributed inference systems and are comfortable scaling models across fleets of accelerators
Enjoy solving end-to-end performance challenges across hardware, system libraries, and orchestration layers
Are excited to be part of a small, fast-moving team building new infrastructure from first principles
Own bring-up, correctness and performance of the OpenAI inference stack on AMD hardware
Integrate internal model-serving infrastructure (e.g., vLLM, Triton) into a variety of GPU-backed systems
Debug and optimize distributed inference workloads across memory, network, and compute layers
Validate correctness, performance, and scalability of model execution on large GPU clusters
Collaborate with partner teams to design and optimize high-performance GPU kernels for accelerators using HIP, Triton, or other performance-focused frameworks
Collaborate with partner teams to build, integrate and tune collective communication libraries (e.g., RCCL) used to parallelize model execution across many GPUs
Preferred
Contributions to open-source libraries like RCCL, Triton, or vLLM
Experience with GPU performance tools (Nsight, rocprof, perf) and memory/comms profiling
Prior experience deploying inference on other non-NVIDIA GPU environments
Knowledge of model/tensor parallelism, mixed precision, and serving 10B+ parameter models
Company
OpenAI
OpenAI is an AI research and deployment company that develops advanced AI models, including ChatGPT. It is a sub-organization of OpenAI Foundation.
H1B Sponsorship
OpenAI has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1)
2024 (1)
2023 (1)
2022 (18)
2021 (10)
2020 (6)
Funding
Current Stage
Growth StageTotal Funding
$79BKey Investors
The Walt Disney CompanySoftBankThrive Capital
2025-12-11Corporate Round· $1B
2025-10-02Secondary Market· $6.6B
2025-03-31Series Unknown· $40B
Recent News
2026-01-14
Company data provided by crunchbase