Apply on Employer Site

AMD · 3 months ago

Principal/Senior GPU Software Performance Engineer — Training at Scale

San Jose, California

Full-time

Hybrid

Senior Level, Lead/Staff

AMD is a company dedicated to building products that enhance next-generation computing experiences. The Principal/Senior GPU Software Performance Engineer will lead kernel-level performance engineering to optimize training across multi-GPU clusters, collaborating with researchers and framework teams to improve efficiency and effectiveness.

AI InfrastructureArtificial Intelligence (AI)Cloud ComputingComputerEmbedded SystemsGPUHardwareSemiconductor

Growth Opportunities

H1B Sponsor Likely

Responsibilities

Own kernel performance: Design, implement, and land high-impact HIP/C++ kernels (e.g., attention, layernorm, softmax, GEMM/epilogues, fused pointwise) that are wave-size portable and optimized for LDS, caches, and MFMA units

Lead profiling & tuning: Build repeatable workflows with timelines, hardware counters, and roofline analysis; remove memory bottlenecks; tune launch geometry/occupancy; validate speedups with A/B harnesses

Drive fusion & algorithmic improvements: Identify profitable fusions, tiling strategies, vectorized I/O, shared-memory/scratchpad layouts, asynchronous pipelines, and warp/wave-level collectives—while maintaining numerical stability

Influence frameworks & libraries: Upstream or extend performance-critical ops in PyTorch/JAX/XLA/Triton; evaluate and integrate vendor math libraries; guide compiler/codegen choices for target architectures

Scale beyond one GPU: Optimize P2P and collective comms, overlap compute/comm, and improve data/pipeline/tensor parallelism throughput across nodes

Benchmarking & SLOs: Define and own KPIs (throughput, time-to-train, $/step, energy/step); maintain dashboards, perf CI gates, and regression triage

Technical leadership: Mentor senior engineers, set coding/perf standards, lead performance “war rooms,” and partner with silicon/vendor teams on microarchitecture-aware optimizations

Quality & reliability: Build reproducible perf harnesses, deterministic test modes, and documentation/playbooks so improvements persist release-over-release

Qualification

GPU kernel optimizationModern C++ (C++17+)GPU microarchitectureProfiling & analysisDistributed trainingMixed precisionCluster orchestrationLinux fundamentalsPythonContainerized developmentCompiler/IR knowledge

Required

Experience in systems/HPC/ML performance engineering, with hands-on GPU kernel work and shipped optimizations in production training or HPC

Expert in modern C++ (C++17+) and at least one GPU programming model (CUDA, HIP, or SYCL/oneAPI) or a GPU kernel DSL (e.g., Triton); comfortable with templates, memory qualifiers, atomics, and warp/wave-level collectives

Deep understanding of GPU microarchitecture: SIMT execution, occupancy vs. register/scratchpad pressure, memory hierarchy (global/L2/shared or LDS), coalescing, bank conflicts, vectorization, and instruction-level parallelism

Proficiency with profiling & analysis: timelines and counters (e.g., Nsight Systems/Compute, rocprof/Omniperf, VTune/GPA or equivalents), ISA/disassembly inspection, and correlating metrics to code changes

Proven track record reducing time-to-train or $-per-step via kernel and collective-comms optimizations on multi-GPU clusters

Strong Linux fundamentals (perf/eBPF, NUMA, PCIe/links), build systems (CMake/Bazel), Python, and containerized dev (Docker/Podman)

Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent

Preferred

Experience with distributed training (PyTorch DDP/FSDP/ZeRO/DeepSpeed or JAX) and GPU collectives

Expertise in mixed precision (BF16/FP16/FP8), numerics, and stability/accuracy validation at kernel boundaries

Background in compiler/IR (LLVM/MLIR) or codegen for GPU backends; ability to guide optimization passes with performance goals

Hands-on with cluster orchestration (Slurm/Kubernetes), IB/RDMA tuning, and compute/communication overlap strategies

Benefits

AMD benefits at a glance

Company

AMD

Glassdoor4.1

Advanced Micro Devices is a semiconductor company that designs and develops graphics units, processors, and media solutions.

Founded in 1969

Santa Clara, California, USA

10001+ employees

http://www.amd.com

H1B Sponsorship

AMD has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (836)

2024 (770)

2023 (551)

2022 (739)

2021 (519)

2020 (547)

Funding

Current Stage

Public Company

Total Funding

unknown

Key Investors

OpenAIDaniel Loeb

2025-10-06Post Ipo Equity

2023-03-02Post Ipo Equity

2021-06-29Post Ipo Equity

Leadership Team

Lisa Su

Chair & CEO

Mark Papermaster

CTO and EVP

Recent News

KitGuru.net

AMD confirms Microsoft’s next-gen Xbox for 2027

2026-02-06

The Next Platform

AMD Finally Makes More Money On GPUs Than CPUs In A Quarter

2026-02-06

semafor.com

Alphabet to double infrastructure spending as it bets on AI

2026-02-06

Company data provided by crunchbase