AMD ยท 4 months ago
Principal GPU Performance Engineer - Artificial Intelligence
Advanced Micro Devices, Inc (AMD) is a leader in transforming lives with technology, focusing on next-generation computing experiences. The Principal GPU Performance Engineer will optimize AI training workloads and guide the evolution of AMD's GPU architectures, collaborating with various teams to enhance training performance and inform future designs.
AI InfrastructureArtificial Intelligence (AI)Cloud ComputingComputerEmbedded SystemsGPUHardwareSemiconductor
Responsibilities
Profile and optimize large-scale AI training workloads (transformers, multimodal, diffusion, recommender systems) across multi-node, multi-GPU clusters
Identify bottlenecks in compute, memory, interconnects, and communication libraries (NCCL/RCCL, MPI), and deliver optimizations to maximize scaling efficiency
Collaborate with compiler/runtime teams to improve kernel performance, scheduling, and memory utilization
Develop and maintain benchmarks and traces representative of foundation model training workloads
Provide performance insights to AMD Instinct GPU architecture teams, informing hardware/software co-design decisions for future architectures
Partner with framework teams (PyTorch, JAX, TensorFlow) to upstream performance improvements and enable better scaling APIs
Present findings to cross-functional teams and leadership, shaping both software and hardware roadmaps
Qualification
Required
Profile and optimize large-scale AI training workloads (transformers, multimodal, diffusion, recommender systems) across multi-node, multi-GPU clusters
Identify bottlenecks in compute, memory, interconnects, and communication libraries (NCCL/RCCL, MPI), and deliver optimizations to maximize scaling efficiency
Collaborate with compiler/runtime teams to improve kernel performance, scheduling, and memory utilization
Develop and maintain benchmarks and traces representative of foundation model training workloads
Provide performance insights to AMD Instinct GPU architecture teams, informing hardware/software co-design decisions for future architectures
Partner with framework teams (PyTorch, JAX, TensorFlow) to upstream performance improvements and enable better scaling APIs
Present findings to cross-functional teams and leadership, shaping both software and hardware roadmaps
Master's or PhD degree in Computer Science or Computer Engineering
Preferred
Strong expertise in GPU tuning and optimization (CUDA, ROCm, or equivalent)
Understanding of GPU microarchitecture (execution units, memory hierarchy, interconnects, warp scheduling)
Hands-on experience with distributed training frameworks and communication libraries (e.g., PyTorch DDP, DeepSpeed, Megatron-LM, NCCL/RCCL, MPI)
Advanced Linux OS, container (e.g. Docker) and GitHub skills
Proficiency in Python or C++ for performance-critical development
Familiarity with large-scale AI training infrastructure (NVLink, InfiniBand, PCIe, cloud/HPC clusters)
Experience in benchmarking methodologies, performance analysis/profiling (e.g. Nsight), performance monitoring tools
Experience scaling training to thousands of GPUs for foundation models a plus
Strong track record of optimizing large-scale AI systems in cloud or HPC environments is desired
Company
AMD
Advanced Micro Devices is a semiconductor company that designs and develops graphics units, processors, and media solutions.
Funding
Current Stage
Public CompanyTotal Funding
unknownKey Investors
OpenAIDaniel Loeb
2025-10-06Post Ipo Equity
2023-03-02Post Ipo Equity
2021-06-29Post Ipo Equity
Recent News
GlobeNewswire
2026-01-09
Company data provided by crunchbase