Apply on Employer Site

AMD · 4 months ago

Principal GPU Performance Engineer - Artificial Intelligence

Austin, Texas

Full-time

Onsite

Lead/Staff

$175K/yr - $300K/yr

Advanced Micro Devices, Inc (AMD) is a leader in transforming lives with technology, focusing on next-generation computing experiences. The Principal GPU Performance Engineer will optimize AI training workloads and guide the evolution of AMD's GPU architectures, collaborating with various teams to enhance training performance and inform future designs.

AI InfrastructureArtificial Intelligence (AI)Cloud ComputingComputerEmbedded SystemsGPUHardwareSemiconductor

Growth Opportunities

No H1B

U.S. Citizen Only

Responsibilities

Profile and optimize large-scale AI training workloads (transformers, multimodal, diffusion, recommender systems) across multi-node, multi-GPU clusters

Identify bottlenecks in compute, memory, interconnects, and communication libraries (NCCL/RCCL, MPI), and deliver optimizations to maximize scaling efficiency

Collaborate with compiler/runtime teams to improve kernel performance, scheduling, and memory utilization

Develop and maintain benchmarks and traces representative of foundation model training workloads

Provide performance insights to AMD Instinct GPU architecture teams, informing hardware/software co-design decisions for future architectures

Partner with framework teams (PyTorch, JAX, TensorFlow) to upstream performance improvements and enable better scaling APIs

Present findings to cross-functional teams and leadership, shaping both software and hardware roadmaps

Qualification

GPU tuningOptimizationDistributed training frameworksPerformance analysis/profilingPythonC++Large-scale AI training infrastructureCuriosityInnovationCollaboration

Required

Profile and optimize large-scale AI training workloads (transformers, multimodal, diffusion, recommender systems) across multi-node, multi-GPU clusters

Identify bottlenecks in compute, memory, interconnects, and communication libraries (NCCL/RCCL, MPI), and deliver optimizations to maximize scaling efficiency

Collaborate with compiler/runtime teams to improve kernel performance, scheduling, and memory utilization

Develop and maintain benchmarks and traces representative of foundation model training workloads

Provide performance insights to AMD Instinct GPU architecture teams, informing hardware/software co-design decisions for future architectures

Partner with framework teams (PyTorch, JAX, TensorFlow) to upstream performance improvements and enable better scaling APIs

Present findings to cross-functional teams and leadership, shaping both software and hardware roadmaps

Master's or PhD degree in Computer Science or Computer Engineering

Preferred

Strong expertise in GPU tuning and optimization (CUDA, ROCm, or equivalent)

Understanding of GPU microarchitecture (execution units, memory hierarchy, interconnects, warp scheduling)

Hands-on experience with distributed training frameworks and communication libraries (e.g., PyTorch DDP, DeepSpeed, Megatron-LM, NCCL/RCCL, MPI)

Advanced Linux OS, container (e.g. Docker) and GitHub skills

Proficiency in Python or C++ for performance-critical development

Familiarity with large-scale AI training infrastructure (NVLink, InfiniBand, PCIe, cloud/HPC clusters)

Experience in benchmarking methodologies, performance analysis/profiling (e.g. Nsight), performance monitoring tools

Experience scaling training to thousands of GPUs for foundation models a plus

Strong track record of optimizing large-scale AI systems in cloud or HPC environments is desired

Company

AMD

Glassdoor4.1

Advanced Micro Devices is a semiconductor company that designs and develops graphics units, processors, and media solutions.

Founded in 1969

Santa Clara, California, USA

10001+ employees

http://www.amd.com

Funding

Current Stage

Public Company

Total Funding

unknown

Key Investors

OpenAIDaniel Loeb

2025-10-06Post Ipo Equity

2023-03-02Post Ipo Equity

2021-06-29Post Ipo Equity

Leadership Team

Lisa Su

Chair & CEO

Mark Papermaster

CTO and EVP

Recent News

Livemint.com

Physical AI dominates CES but humanity will still have to wait a while for humanoid servants

2026-01-09

GlobeNewswire

KunlunMeta Partners with AMD to Shine at CES

2026-01-09

The Register

AMD boasts 1000x higher AI perf by 2027 and pulls the lid off Helios compute tray ahead of 2H 2026 launch

2026-01-08

Company data provided by crunchbase