Apply on Employer Site

Advanced Microdevices Pvt. Ltd. (India) · 9 hours ago

Senior Software Development Engineer – SGLang and Inference Stack

Santa Clara, CA

Full-time

Onsite

Senior Level

Advanced Micro Devices, Inc is a company focused on building products that accelerate next-generation computing experiences. The Senior Software Development Engineer will optimize and develop deep learning frameworks for AMD GPUs, enhancing performance and collaborating with various teams to drive contributions to the AI software ecosystem.

BiotechnologyIndustrialPharmaceuticalManufacturingBiopharma

Responsibilities

Optimize Deep Learning Frameworks: Enhance performance of frameworks like TensorFlow, PyTorch, and SGLang on AMD GPUs via upstream contributions in open-source repositories

Develop and Optimize Deep Learning Models: Profile, analyze, code change and tune large-scale training and inference models for optimal performance on AMD hardware. Day-0 supports to many SOTA models, DeepSeek 3.2, Kimi K2.5, etc

GPU Kernel Development: Design, implement, and optimize high-performance GPU kernels using HIP, Triton, TileLang or other DSLs for AI operator efficiency

Collaborate with GPU Library and Compiler Teams: Work closely with internal compiler and GPU math library teams to integrate, optimize and align kernel-level optimizations with full-stack performance goals. Initiate and help with different level codegen optimizations

Contribute to SGLang Development: Support optimization, feature development, and scaling of the SGLang framework across AMD GPU platforms for LLM, multimodal serving and RL-training

Distributed System Optimization: Tune and scale performance across both multi-GPU (scale-up) and multi-node (scale-out) environments, including inference parallelism, prefill-decode disaggregation, Wide-EP and collective communication strategies

Graph Compiler Integration: Integrate and optimize runtime execution through graph compilers such as XLA, TorchDynamo, or custom pipelines

Open-Source Collaboration: Partner with external maintainers to understand framework needs, propose optimizations, and upstream contributions effectively

Apply Engineering Best Practices: Leverage modern software engineering practices in debugging, profiling, test-driven development, and CI/CD integration

Qualification

GPGPU C++Deep Learning FrameworksSGLangPythonGPU Kernel DevelopmentCompiler KnowledgeDistributed SystemsSoftware Engineering PracticesProblem-SolvingCollaboration

Required

Skilled engineer with strong technical and analytical expertise in GPGPU C++, Triton, TileLang or DSL development within Linux environments

Ability to define goals, manage development efforts, and deliver high-quality solutions

Strong problem-solving skills

Proactive approach

Keen understanding of software engineering best practices

Optimize Deep Learning Frameworks: Enhance performance of frameworks like TensorFlow, PyTorch, and SGLang on AMD GPUs via upstream contributions in open-source repositories

Develop and Optimize Deep Learning Models: Profile, analyze, code change and tune large-scale training and inference models for optimal performance on AMD hardware

GPU Kernel Development: Design, implement, and optimize high-performance GPU kernels using HIP, Triton, TileLang or other DSLs for AI operator efficiency

Contribute to SGLang Development: Support optimization, feature development, and scaling of the SGLang framework across AMD GPU platforms for LLM, multimodal serving and RL-training

Distributed System Optimization: Tune and scale performance across both multi-GPU (scale-up) and multi-node (scale-out) environments

Graph Compiler Integration: Integrate and optimize runtime execution through graph compilers such as XLA, TorchDynamo, or custom pipelines

Open-Source Collaboration: Partner with external maintainers to understand framework needs, propose optimizations, and upstream contributions effectively

Apply Engineering Best Practices: Leverage modern software engineering practices in debugging, profiling, test-driven development, and CI/CD integration

Bachelor's and/or Master's Degree in Computer Science, Computer Engineering, Electrical Engineering, Physics or a related field

Preferred

Strong Programming Skills: Proficient in C++ and/or Python (PyTorch, Triton, TileLang), with demonstrated ability to code, debug, profile, and optimize performance-critical code

SGLang and LLM Optimization: Hands-on experience with SGLang or similar LLM inference frameworks is highly preferred

Compiler and GPU Architecture Knowledge: Background in compiler design or familiarity with technologies like LLVM, MLIR, or ROCm is a plus

Heterogeneous System Workloads: Experience running and scaling workloads on large-scale, heterogeneous clusters (CPU + GPU) using distributed training or inference strategies

AI Framework Integration: Experience contributing to or integrating optimizations into deep learning frameworks such as PyTorch, SGLang, vLLM, Slime, VeRL

GPGPU Computing: Working knowledge of HIP, CUDA, Triton, TileLang or other GPU programming models; experience with GCN/CDNA architecture preferred

Benefits

AMD benefits at a glance.

Company

Advanced Microdevices Pvt. Ltd. (India)

Advanced Microdevices (mdi) is a leader in innovative membrane technologies.

Founded in 1976

Ambala, Haryana, IND

501-1000 employees

https://mdimembrane.com

Funding

Current Stage

Late Stage

Leadership Team

Nalini Kant Gupta

Founder & Managing Director

Recent News

The Motley Fool

Lisa Su Just Delivered Incredible News for Advanced Micro Devices Stock Investors

2024-10-18

TradingView

What's Going On With Advanced Micro Devices Stock Tuesday?

2024-10-16

Company data provided by crunchbase