AMD · 9 hours ago
Principal ML Engineer - Large Scale Training Performance Optimization
AMD is a company focused on building products that accelerate next-generation computing experiences. They are seeking a Principal Machine Learning Engineer to improve training efficiency for large models on GPUs, particularly in the context of generative AI at scale.
AI InfrastructureArtificial Intelligence (AI)Cloud ComputingComputerEmbedded SystemsGPUHardwareSemiconductor
Responsibilities
Train large models to convergence on AMD GPUs at scale
Improve the end-to-end training pipeline performance
Optimize the distributed training pipeline and algorithm to scale out
Contribute your changes to open source
Stay up-to-date with the latest training algorithms
Influence the direction of AMD AI platform
Collaborate across teams with various groups and stakeholders
Qualification
Required
Experience with distributed training pipelines
Knowledgeable in distributed training algorithms (Data Parallel, Tensor Parallel, Pipeline Parallel, Expert Parallel ZeRO)
Familiar with training large models at scale
Train large models to convergence on AMD GPUs at scale
Improve the end-to-end training pipeline performance
Optimize the distributed training pipeline and algorithm to scale out
Contribute your changes to open source
Stay up-to-date with the latest training algorithms
Influence the direction of AMD AI platform
Collaborate across teams with various groups and stakeholders
A master's degree or PhD degree in Computer Science, Artificial Intelligence, Machine Learning, or a related field
Preferred
Experience with ML/DL frameworks such as PyTorch, JAX, or TensorFlow
Experience with distributed training and distributed training frameworks, such as Megatron-LM, MaxText, TorchTitan
Experience with LLMs or computer vision, especially large models
Experience with GPU kernel optimization
Excellent Python or C++ programming skills, including debugging, profiling, and performance analysis at scale
Experience with ML infra at kernel, framework, or system level
Strong communication and problem-solving skills
Benefits
AMD benefits at a glance.
Company
AMD
Advanced Micro Devices is a semiconductor company that designs and develops graphics units, processors, and media solutions.
H1B Sponsorship
AMD has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (836)
2024 (770)
2023 (551)
2022 (739)
2021 (519)
2020 (547)
Funding
Current Stage
Public CompanyTotal Funding
unknownKey Investors
OpenAIDaniel Loeb
2025-10-06Post Ipo Equity
2023-03-02Post Ipo Equity
2021-06-29Post Ipo Equity
Recent News
2026-01-20
2026-01-19
2026-01-18
Company data provided by crunchbase