Apply on Employer Site

AMD · 5 days ago

AI/ML and GPU Performance QA engineer

Austin, TX

Full-time

Onsite

Senior Level, Lead/Staff

$134K/yr - $202K/yr

8+ years exp

AMD is a company focused on building innovative products that enhance computing experiences across various domains. They are seeking a Senior Technical Validation Engineer to lead validation and performance engineering for Machine Learning and High-Performance Computing frameworks, ensuring the delivery of high-quality software for AI and HPC workloads.

AI InfrastructureArtificial Intelligence (AI)Cloud ComputingComputerEmbedded SystemsGPUHardwareSemiconductor

Growth Opportunities

H1B Sponsor Likely

Hiring Manager

Tressa Cooper (she/her)

Responsibilities

Lead validation for ML/AI models: accuracy testing, performance benchmarking, regression, drift detection, A/B testing

Test ML frameworks: PyTorch, Hugging Face, MLFlow experiment tracking

Validate wide varieties of AI models to ensure correctness in distributed training or inference

Perform GPU testing & profiling: ROCm/CUDA validation, performance profiling, memory/thermal analysis, multi-GPU scaling

Validate HPC frameworks, distributed runtimes, compilers, and GPU libraries

Build scalable CI/CD workflows for ML/HPC validation. Develop automated test pipelines using Docker, Kubernetes, GitHub Actions, Jenkins

Validate cloud-based AI workloads on AWS SageMaker, Lambda, and S3

Test the benchmarks under containerized and virtualized GPU environments

Design and implement automated validation pipelines for ML frameworks (e.g., PyTorch, TensorFlow, JAX) across GPU platforms

Develop and maintain benchmarking suites for AI models and HPC workloads, focusing on performance, scalability, and regression detection

Multi-node validation efforts using orchestration tools (e.g., Slurm, MPI, Kubernetes) to simulate real-world distributed training and inference

Collaborate with hardware and software teams to validate GPU hardware platforms (NVIDIA CUDA, AMD ROCm) for ML and HPC readiness

Analyze performance metrics using profiling tools (e.g.,rocprof) and provide actionable insights

Drive test content development for emerging AI workloads, including LLMs, vision models, and scientific computing benchmarks

Perform bottleneck analysis, hyperparameter validation, and competitive benchmarking

Mentor junior engineers and contribute to validation strategy, tooling, and best practices

Qualification

GPU architectureML frameworksCI/CD systemsPerformance benchmarkingROCmCUDADistributed systemsScripting PythonScripting BashCommunication skillsCollaboration skills

Required

Good understanding and experience in ROCm, CUDA, GPU architecture, ML frameworks, CI/CD systems, benchmarking, and competitive analysis

Lead validation for ML/AI models: accuracy testing, performance benchmarking, regression, drift detection, A/B testing

Test ML frameworks: PyTorch, Hugging Face, MLFlow experiment tracking

Validate wide varieties of AI models to ensure correctness in distributed training or inference

Perform GPU testing & profiling: ROCm/CUDA validation, performance profiling, memory/thermal analysis, multi-GPU scaling

Validate HPC frameworks, distributed runtimes, compilers, and GPU libraries

Build scalable CI/CD workflows for ML/HPC validation. Develop automated test pipelines using Docker, Kubernetes, GitHub Actions, Jenkins

Validate cloud-based AI workloads on AWS SageMaker, Lambda, and S3

Test the benchmarks under containerized and virtualized GPU environments

Design and implement automated validation pipelines for ML frameworks (e.g., PyTorch, TensorFlow, JAX) across GPU platforms

Develop and maintain benchmarking suites for AI models and HPC workloads, focusing on performance, scalability, and regression detection

Multi-node validation efforts using orchestration tools (e.g., Slurm, MPI, Kubernetes) to simulate real-world distributed training and inference

Collaborate with hardware and software teams to validate GPU hardware platforms (NVIDIA CUDA, AMD ROCm) for ML and HPC readiness

Analyze performance metrics using profiling tools (e.g., rocprof) and provide actionable insights

Drive test content development for emerging AI workloads, including LLMs, vision models, and scientific computing benchmarks

Perform bottleneck analysis, hyperparameter validation, and competitive benchmarking

Mentor junior engineers and contribute to validation strategy, tooling, and best practices

Preferred

Bachelor's or Master's degree in Computer Science, Electrical Engineering, or related field

8+ years of experience in validation engineering, ML infrastructure, or HPC performance testing

Strong hands-on experience with GPU platforms (NVIDIA CUDA, AMD ROCm) and their software ecosystems

Deep understanding of AI model architectures, training/inference workflows, and ML performance bottlenecks

Proven experience with CI/CD systems, Git, Docker, and automated test frameworks

Expertise in multi-node orchestration and distributed system validation

Familiarity with HPC benchmarks (e.g., HPL, HPCG, MLPerf) and AI model benchmarking methodologies

Proficiency in scripting and automation (Python, Bash, YAML) in Linux environments

Strong communication, documentation, and cross-functional collaboration skills

Benefits

AMD benefits at a glance.

Company

AMD

Advanced Micro Devices is a semiconductor company that designs and develops graphics units, processors, and media solutions.

Founded in 1969

Santa Clara, California, USA

10001+ employees

http://www.amd.com

H1B Sponsorship

AMD has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (836)

2024 (770)

2023 (551)

2022 (739)

2021 (519)

2020 (547)

Funding

Current Stage

Public Company

Total Funding

unknown

Key Investors

OpenAIDaniel Loeb

2025-10-06Post Ipo Equity

2023-03-02Post Ipo Equity

2021-06-29Post Ipo Equity

Leadership Team

Lisa Su

Chair & CEO

Mark Papermaster

CTO and EVP

Recent News

Hot Hardware

AMD Unveils Ryzen Halo: Hands-On With A Powerful New AI Mini-PC

2026-01-13

Morningstar.com

CES 2026: The Future is Here

2026-01-11

CRN

Qualcomm Loses Second Channel Leader Amid Snapdragon X2 PC Chip Push

2026-01-11

Company data provided by crunchbase