Apply on Employer Site

Advanced Microdevices Pvt. Ltd. (India) · 5 hours ago

Senior Software Development Engineer – Distributed Inference

Santa Clara, CA

Full-time

Onsite

Senior Level

Advanced Micro Devices, Inc is a company dedicated to building products that enhance computing experiences across various domains. They are seeking a Senior Software Development Engineer to focus on Distributed Inference, working on optimizing AI workloads and collaborating with internal teams to enhance performance across distributed systems.

BiopharmaBiotechnologyIndustrialManufacturing

Responsibilities

Distributed AI Enablement and Benchmarking: Enable and benchmark AI models on large-scale distributed systems to evaluate performance, accuracy, and scalability

Scalable Systems Optimization: Optimize AI workloads across scale-up (multi-GPU), scale-out (multi-node), and scale-across distributed system configurations

Cross-Team Collaboration: Collaborate closely with internal GPU library teams to analyze and optimize distributed workloads for high throughput and low latency

Parallelization Strategies: Develop and apply optimal parallelization strategies for AI workloads to achieve best-in-class performance across diverse system configurations

Model Infrastructure and Management: Contribute to distributed model management systems, model zoos, monitoring frameworks, benchmarking pipelines, and technical documentation

Performance Monitoring and Visualization: Build and maintain real-time dashboards reporting performance, accuracy, and reliability metrics for internal stakeholders and external users

Qualification

C++PythonDistributed SystemsAI FrameworksCluster ManagementCI/CD ToolsQuality AssuranceProblem-SolvingCollaboration

Required

Strong technical expertise in C++/ Python development

Experience solving performance and investigating scalability on multi-GPU, multi-node clusters

Passionate about quality assurance, benchmarking, and automation in the AI/ML space

Ability to thrive in both collaborative and independent environments

Excellent problem-solving skills

Ownership in defining goals and delivering impactful solutions

Enable and benchmark AI models on large-scale distributed systems to evaluate performance, accuracy, and scalability

Optimize AI workloads across scale-up (multi-GPU), scale-out (multi-node), and scale-across distributed system configurations

Collaborate closely with internal GPU library teams to analyze and optimize distributed workloads for high throughput and low latency

Develop and apply optimal parallelization strategies for AI workloads to achieve best-in-class performance across diverse system configurations

Contribute to distributed model management systems, model zoos, monitoring frameworks, benchmarking pipelines, and technical documentation

Build and maintain real-time dashboards reporting performance, accuracy, and reliability metrics for internal stakeholders and external users

Master's or PhD degree in Computer Science, Computer Engineering, or a related field, or equivalent practical experience

Preferred

Hands-on experience with AI inference or serving frameworks such as vLLM, SGLang, and Llama.cpp

Understanding KV cache transfer mechanisms and technologies (e.g., Mooncake, NIXL/RIXL) and expert parallelization approaches (e.g., DeepEP, MORI, PPLX-Garden)

Strong C/C++ and Python skills, with experience in software design, debugging, performance analysis, and test development

Experience running AI workloads on large-scale, heterogeneous compute clusters

Familiarity with cluster management and orchestration platforms such as SLURM and Kubernetes (K8s)

Experience with GitHub, Jenkins, or similar CI/CD tools and modern development workflows

Benefits

AMD benefits at a glance.

Company

Advanced Microdevices Pvt. Ltd. (India)

Advanced Microdevices (mdi) is a leader in innovative membrane technologies.

Founded in 1976

Ambala, Haryana, IND

501-1000 employees

https://mdimembrane.com

Funding

Current Stage

Late Stage

Leadership Team

Nalini Kant Gupta

Founder & Managing Director

Recent News

The Motley Fool

Lisa Su Just Delivered Incredible News for Advanced Micro Devices Stock Investors

2024-10-18

TradingView

What's Going On With Advanced Micro Devices Stock Tuesday?

2024-10-16

Company data provided by crunchbase