MTS - Distributed Inferencing Software Engineer - AI Models jobs in United States
cer-icon
Apply on Employer Site
company-logo

Advanced Microdevices Pvt. Ltd. (India) ยท 1 month ago

MTS - Distributed Inferencing Software Engineer - AI Models

Advanced Micro Devices, Inc is a company focused on building innovative products that enhance computing experiences across various domains. They are seeking a Distributed Inferencing Software Engineer who will work on optimizing AI models on distributed systems, collaborating with GPU library teams to enhance performance and scalability.

BiopharmaBiotechnologyIndustrialManufacturing

Responsibilities

Enable, benchmark AI models on distributed systems
Work in a distributed computing setting to optimize for both scale-up (multi-GPU) / scale-out (multi-node) / scale-across systems
Collaborate and interact with internal GPU library teams to analyze and optimize distributed workloads for high throughput/low latency
Expertise on parallelization strategies for AI workloads - and application for best performance for each configuration
Contribute to distributed model management, model zoos, monitoring, benchmarking and documentation

Qualification

C++PythonGPU computingAI framework engineeringDistributed systemsPerformance analysisSLURMK8sDebuggingTest design

Required

Strong technical and analytical skills in C++/Python AI development, solving performance and investigating scalability on multi-GPU, multi-node clusters
Ability to work as part of a team, while also being able to work independently, define goals and scope and lead your own development effort

Preferred

Knowledge of GPU computing (HIP, CUDA, OpenCL)
AI framework engineering experience (vLLM, SGLang, Llama.cpp)
Understanding of KV cache transfer mechanisms, options (Mooncake, NIXL/RIXL) and Expert Parallelization (DeepEP/MORI/PPLX-Garden)
Excellent C/C++/Python programming and software design skills, including debugging, performance analysis, and test design
Experiences to run workloads, especially AI models, on large scale heterogeneous cluster
Familiarity with clusters and orchestration software (SLURM, K8s)

Benefits

AMD benefits at a glance.

Company

Advanced Microdevices Pvt. Ltd. (India)

twittertwittertwitter
company-logo
Advanced Microdevices (mdi) is a leader in innovative membrane technologies.

Funding

Current Stage
Late Stage

Leadership Team

leader-logo
Nalini Kant Gupta
Founder & Managing Director
Company data provided by crunchbase