Apply on Employer Site

Advanced Microdevices Pvt. Ltd. (India) · 2 months ago

Principal/Senior GPU Software Performance Engineer — Training at Scale

San Jose, CA

Full-time

Onsite

Senior Level, Lead/Staff

Advanced Micro Devices, Inc is dedicated to building innovative products that enhance computing experiences. The Principal/Senior GPU Software Performance Engineer will focus on optimizing GPU performance for large model training across multi-GPU clusters, collaborating with various teams to improve efficiency and throughput.

BiopharmaBiotechnologyIndustrialManufacturing

Responsibilities

Own kernel performance: Design, implement, and land high‑impact HIP/C++ kernels (e.g., attention, layernorm, softmax, GEMM/epilogues, fused pointwise) that are wave‑size portable and optimized for LDS, caches, and MFMA units

Lead profiling & tuning: Build repeatable workflows with timelines, hardware counters, and roofline analysis; remove memory bottlenecks; tune launch geometry/occupancy; validate speedups with A/B harnesses

Drive fusion & algorithmic improvements: Identify profitable fusions, tiling strategies, vectorized I/O, shared‑memory/scratchpad layouts, asynchronous pipelines, and warp/wave‑level collectives—while maintaining numerical stability

Influence frameworks & libraries: Upstream or extend performance‑critical ops in PyTorch/JAX/XLA/Triton; evaluate and integrate vendor math libraries; guide compiler/codegen choices for target architectures

Scale beyond one GPU: Optimize P2P and collective comms, overlap compute/comm, and improve data/pipeline/tensor parallelism throughput across nodes

Benchmarking & SLOs: Define and own KPIs (throughput, time‑to‑train, $/step, energy/step); maintain dashboards, perf CI gates, and regression triage

Technical leadership: Mentor senior engineers, set coding/perf standards, lead performance “war rooms,” and partner with silicon/vendor teams on microarchitecture‑aware optimizations

Quality & reliability: Build reproducible perf harnesses, deterministic test modes, and documentation/playbooks so improvements persist release‑over‑release

Qualification

GPU performance engineeringC++17+CUDA/HIP/SYCLGPU microarchitectureProfiling & analysis toolsLinux fundamentalsDistributed trainingMixed precisionCompiler/IR knowledgeCluster orchestration

Required

Experience in systems/HPC/ML performance engineering, with hands-on GPU kernel work and shipped optimizations in production training or HPC

Expert in modern C++ (C++17+) and at least one GPU programming model (CUDA, HIP, or SYCL/oneAPI) or a GPU kernel DSL (e.g., Triton); comfortable with templates, memory qualifiers, atomics, and warp/wave-level collectives

Deep understanding of GPU microarchitecture: SIMT execution, occupancy vs. register/scratchpad pressure, memory hierarchy (global/L2/shared or LDS), coalescing, bank conflicts, vectorization, and instruction-level parallelism

Proficiency with profiling & analysis: timelines and counters (e.g., Nsight Systems/Compute, rocprof/Omniperf, VTune/GPA or equivalents), ISA/disassembly inspection, and correlating metrics to code changes

Proven track record reducing time-to-train or $-per-step via kernel and collective-comms optimizations on multi-GPU clusters

Strong Linux fundamentals (perf/eBPF, NUMA, PCIe/links), build systems (CMake/Bazel), Python, and containerized dev (Docker/Podman)

Experience with distributed training (PyTorch DDP/FSDP/ZeRO/DeepSpeed or JAX) and GPU collectives

Expertise in mixed precision (BF16/FP16/FP8), numerics, and stability/accuracy validation at kernel boundaries

Background in compiler/IR (LLVM/MLIR) or codegen for GPU backends; ability to guide optimization passes with performance goals

Hands-on with cluster orchestration (Slurm/Kubernetes), IB/RDMA tuning, and compute/communication overlap strategies

Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent

Benefits

AMD benefits at a glance.

Company

Advanced Microdevices Pvt. Ltd. (India)

Advanced Microdevices (mdi) is a leader in innovative membrane technologies.

Founded in 1976

Ambala, Haryana, IND

501-1000 employees

https://mdimembrane.com

Funding

Current Stage

Late Stage

Leadership Team

Nalini Kant Gupta

Founder & Managing Director

Recent News

The Motley Fool

Lisa Su Just Delivered Incredible News for Advanced Micro Devices Stock Investors

2024-10-18

TradingView

What's Going On With Advanced Micro Devices Stock Tuesday?

2024-10-16

Company data provided by crunchbase