Advanced Microdevices Pvt. Ltd. (India) ยท 4 months ago
Principal GPU Performance Engineer - Artificial Intelligence
Advanced Micro Devices, Inc is focused on transforming lives with their technology to enrich various industries and communities. They are seeking a Principal GPU Performance Engineer to optimize AI training workloads and guide the evolution of next-generation AMD Instinct GPU architectures, collaborating across software and hardware to improve efficiency and performance.
BiopharmaBiotechnologyIndustrialManufacturing
Responsibilities
Profile and optimize large-scale AI training workloads (transformers, multimodal, diffusion, recommender systems) across multi-node, multi-GPU clusters
Identify bottlenecks in compute, memory, interconnects, and communication libraries (NCCL/RCCL, MPI), and deliver optimizations to maximize scaling efficiency
Collaborate with compiler/runtime teams to improve kernel performance, scheduling, and memory utilization
Develop and maintain benchmarks and traces representative of foundation model training workloads
Provide performance insights to AMD Instinct GPU architecture teams, informing hardware/software co-design decisions for future architectures
Partner with framework teams (PyTorch, JAX, TensorFlow) to upstream performance improvements and enable better scaling APIs
Present findings to cross-functional teams and leadership, shaping both software and hardware roadmaps
Qualification
Required
Profile and optimize large-scale AI training workloads (transformers, multimodal, diffusion, recommender systems) across multi-node, multi-GPU clusters
Identify bottlenecks in compute, memory, interconnects, and communication libraries (NCCL/RCCL, MPI), and deliver optimizations to maximize scaling efficiency
Collaborate with compiler/runtime teams to improve kernel performance, scheduling, and memory utilization
Develop and maintain benchmarks and traces representative of foundation model training workloads
Provide performance insights to AMD Instinct GPU architecture teams, informing hardware/software co-design decisions for future architectures
Partner with framework teams (PyTorch, JAX, TensorFlow) to upstream performance improvements and enable better scaling APIs
Present findings to cross-functional teams and leadership, shaping both software and hardware roadmaps
Master's or PhD degree in Computer Science or Computer Engineering
Preferred
Strong expertise in GPU tuning and optimization (CUDA, ROCm, or equivalent)
Understanding of GPU microarchitecture (execution units, memory hierarchy, interconnects, warp scheduling)
Hands-on experience with distributed training frameworks and communication libraries (e.g., PyTorch DDP, DeepSpeed, Megatron-LM, NCCL/RCCL, MPI)
Advanced Linux OS, container (e.g. Docker) and GitHub skills
Proficiency in Python or C++ for performance-critical development
Familiarity with large-scale AI training infrastructure (NVLink, InfiniBand, PCIe, cloud/HPC clusters)
Experience in benchmarking methodologies, performance analysis/profiling (e.g. Nsight), performance monitoring tools
Experience scaling training to thousands of GPUs for foundation models a plus
Strong track record of optimizing large-scale AI systems in cloud or HPC environments is desired
Benefits
AMD benefits at a glance.
Company
Advanced Microdevices Pvt. Ltd. (India)
Advanced Microdevices (mdi) is a leader in innovative membrane technologies.
Funding
Current Stage
Late StageLeadership Team
Nalini Kant Gupta
Founder & Managing Director
Recent News
2024-10-18
2024-10-16
Company data provided by crunchbase