AI/ML Validation Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Advanced Microdevices Pvt. Ltd. (India) · 1 day ago

AI/ML Validation Engineer

Advanced Micro Devices, Inc. is dedicated to building innovative products that enhance computing experiences across various domains. They are seeking an AI solutions validation engineer to validate AI solutions for distributed training and inference workloads, build automation for these workloads, and develop new technologies.

BiopharmaBiotechnologyIndustrialManufacturing

Responsibilities

Work with AMD’s architecture specialists to validate AI solutions for distributed training and inference workloads with AMD's ROCM software
Build cluster scale automation for distributed training and inference workloads
Publish reference designs and benchmark numbers for AI workloads
Apply a data minded approach to target optimization efforts
Design and develop new groundbreaking AMD technologies
Participating in new ASIC and hardware bring ups
Develop technical relationships with peers and partners

Qualification

AI solutions validationDistributed trainingMLOpsPythonKubernetesPerformance profilingLinuxEffective communicationProblem-solving

Required

Bachelor's or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent

Preferred

Good experience with complex compute systems used in AI, HPC deployments, backend network designs in RDMA clusters
Experience in validating complex AI infrastructure - GPUs, networking, ROCEv2, UEC, running benchmark tests like IBPerf benchmarking, RCCL/NCCL
Experience with running training of LLMs, MoE models, Image Generation, recommendations models with different frameworks like PyTorch, Tensorflow, Megatron-LM, JAX. Running training performance benchmarks
Experience with running inference workloads in AI clusters with different inference frameworks like vLLM, SGLang. Running performance benchmarks for inference
Experience with distributed systems and schedulers like Kubernetes, Slurm
Ability to write high quality automation frameworks and scripts using Python or Golang
Experience with performance profiling of CPUs, GPUs and debugging complex compute, network, storage problems
Experience with AMD ROCM would be an added advantage
Experience with Linux, Windows operating systems
Effective communication and problem-solving skills

Benefits

AMD benefits at a glance.

Company

Advanced Microdevices Pvt. Ltd. (India)

twittertwittertwitter
company-logo
Advanced Microdevices (mdi) is a leader in innovative membrane technologies.

Funding

Current Stage
Late Stage

Leadership Team

leader-logo
Nalini Kant Gupta
Founder & Managing Director
Company data provided by crunchbase