Apply on Employer Site

AMD · 8 hours ago

Principal GPU Performance Engineer - Artificial Intelligence

Santa Clara, CA

Full-time

Onsite

Lead/Staff

AMD is a leading company in computing technologies, dedicated to building innovative products that enhance computing experiences. The Principal GPU Performance Engineer will optimize AI training and inference workloads while collaborating with various teams to improve the efficiency of AMD's GPU architectures.

AI InfrastructureArtificial Intelligence (AI)Cloud ComputingComputerEmbedded SystemsGPUHardwareSemiconductor

Growth Opportunities

No H1B

Responsibilities

Profile and optimize large-scale AI training and inference workloads (transformers, multimodal, diffusion, recommender systems) across multi-node, multi-GPU clusters

Identify bottlenecks in compute, memory, interconnects, and communication libraries (NCCL/RCCL, MPI), and deliver optimizations to maximize scaling efficiency

Collaborate with compiler/runtime teams to improve kernel performance, scheduling, and memory utilization

Develop, maintain and recommend benchmarks representative of foundation model AI training and inference workloads

Provide performance insights to AMD Instinct GPU architecture teams, informing hardware/software co-design decisions for future architectures

Partner with framework teams (PyTorch, JAX, TensorFlow) to upstream performance improvements and enable better scaling APIs

Present findings to cross-functional teams and leadership, shaping both software and hardware roadmaps

Qualification

GPU tuningOptimizationDistributed training frameworksPerformance analysis/profilingPythonC++Large-scale AI infrastructureAdvanced Linux OSCuriosityInnovationCollaboration skills

Required

Profile and optimize large-scale AI training and inference workloads (transformers, multimodal, diffusion, recommender systems) across multi-node, multi-GPU clusters

Identify bottlenecks in compute, memory, interconnects, and communication libraries (NCCL/RCCL, MPI), and deliver optimizations to maximize scaling efficiency

Collaborate with compiler/runtime teams to improve kernel performance, scheduling, and memory utilization

Develop, maintain and recommend benchmarks representative of foundation model AI training and inference workloads

Provide performance insights to AMD Instinct GPU architecture teams, informing hardware/software co-design decisions for future architectures

Partner with framework teams (PyTorch, JAX, TensorFlow) to upstream performance improvements and enable better scaling APIs

Present findings to cross-functional teams and leadership, shaping both software and hardware roadmaps

Strong expertise in GPU tuning and optimization (CUDA, ROCm, or equivalent)

Understanding of GPU microarchitecture (execution units, memory hierarchy, interconnects, warp scheduling)

Hands-on experience with distributed training and inference frameworks and communication libraries (e.g., PyTorch DDP, DeepSpeed, Megatron-LM, NCCL/RCCL, MPI)

Advanced Linux OS, container (e.g. Docker) and GitHub skills

Proficiency in Python or C++ for performance-critical development

Familiarity with large-scale AI training and inference infrastructure (NVLink, InfiniBand, PCIe, cloud/HPC clusters)

Experience in benchmarking methodologies, performance analysis/profiling (e.g. Nsight), performance monitoring tools

Experience scaling training to thousands of GPUs for foundation models a plus

Strong track record of optimizing large-scale AI systems or HPC environments is desired

Master's or PhD degree in Computer Science or Computer Engineering

Benefits

AMD benefits at a glance.

Company

AMD

Glassdoor4.1

Advanced Micro Devices is a semiconductor company that designs and develops graphics units, processors, and media solutions.

Founded in 1969

Santa Clara, California, USA

10001+ employees

http://www.amd.com

Funding

Current Stage

Public Company

Total Funding

unknown

Key Investors

OpenAIDaniel Loeb

2025-10-06Post Ipo Equity

2023-03-02Post Ipo Equity

2021-06-29Post Ipo Equity

Leadership Team

Lisa Su

Chair & CEO

Mark Papermaster

CTO and EVP

Recent News

CRN

AMD CEO On ‘Next Phase Of AI’ Investments In EPYC, Ryzen, GPUs And Partners In 2026

2026-01-23

PCMag.com - Technology Product Reviews, News, Prices & Tips

AMD Ryzen 7 9850X3D Launches on Jan. 29 for $499

2026-01-23

KitGuru.net

AMD confirms Ryzen 7 9850X3D for late January release, $499

2026-01-23

Company data provided by crunchbase