AMD · 2 months ago
Director of Machine Learning Engineering -- Training and Performance
Advanced Micro Devices, Inc (AMD) is a leading company in the technology sector focused on building innovative products for next-generation computing experiences. They are seeking a Director of Machine Learning Engineering to define and execute the technical vision for distributed training of large-scale generative AI and recommendation models, guiding a world-class engineering team to optimize model performance and efficiency.
AI InfrastructureArtificial Intelligence (AI)Cloud ComputingComputerEmbedded SystemsGPUHardwareSemiconductor
Responsibilities
Define and drive AMD’s distributed training strategy for large-scale generative and recommendation models
Architect and optimize distributed training pipelines (Pre-training, SFT, RL etc.) for large-scale models
Lead development of high-performance, reliable training pipelines that scale across thousands of GPUs
Partner with compiler, runtime, system software, and hardware architecture teams to co-design solutions that maximize end-to-end performance
Build, mentor, and empower a team of expert engineers focused on innovation, collaboration, and technical excellence
Drive AMD’s engagement in open-source communities through contributions to frameworks such as PyTorch, JAX, TorchTitan, and Megatron-LM
Stay ahead of emerging advances in distributed training, LLMs, recommendation systems, and AI infrastructure — and translate them into scalable engineering practices
Qualification
Required
10+ years in machine learning, distributed systems, or AI infrastructure; 5+ years in technical leadership or management roles
Proven experience building and optimizing distributed training systems for large models
Strong familiarity with ML frameworks (PyTorch, JAX, TensorFlow) and distributed frameworks (TorchTitan, Megatron-LM)
Hands-on expertise with LLMs, recommendation systems, or ranking models
Proficiency in Python and C++, including performance profiling, debugging, and large-scale optimization
Experience collaborating across hardware, compiler, and system software layers
Excellent communication, leadership, and problem-solving skills with the ability to influence across organizations and external partners
Master's or Ph.D. in Computer Science, Artificial Intelligence, Machine Learning, or a related field
Preferred
Prefer experience in both model and application-level development and optimization
Benefits
AMD benefits at a glance
Company
AMD
Advanced Micro Devices is a semiconductor company that designs and develops graphics units, processors, and media solutions.
H1B Sponsorship
AMD has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (836)
2024 (770)
2023 (551)
2022 (739)
2021 (519)
2020 (547)
Funding
Current Stage
Public CompanyTotal Funding
unknownKey Investors
OpenAIDaniel Loeb
2025-10-06Post Ipo Equity
2023-03-02Post Ipo Equity
2021-06-29Post Ipo Equity
Recent News
GlobeNewswire
2026-01-09
Company data provided by crunchbase