Apply on Employer Site

AMD · 4 hours ago

Senior Software Development Engineer - LLM Kernel & Inference Systems

Santa Clara, CA

Full-time

Onsite

Senior Level

$192K/yr - $288K/yr

AMD is a leading company in building products that accelerate next-generation computing experiences, including AI and data centers. They are seeking a Senior Member of Technical Staff to lead in Large Language Model (LLM) inference and kernel optimization for AMD GPUs, focusing on optimizing GPU kernels and inference runtimes.

AI InfrastructureArtificial Intelligence (AI)Cloud ComputingComputerEmbedded SystemsGPUHardwareSemiconductor

Growth Opportunities

H1B Sponsor Likely

Responsibilities

Optimize LLM Inference Frameworks Drive performance improvements in LLM inference frameworks such as vLLM, SGLang, and PyTorch for AMD GPUs, contributing both internally and upstream

LLM-Aware Kernel Development Design and optimize GPU kernels critical to LLM inference, including attention, GEMMs, KV cache operations, MoE components, and memory-bound kernels

Distributed LLM Inference at Scale Design, implement, and tune multi-GPU and multi-node inference strategies, including TP / PP / EP hybrids, continuous batching, KV cache management, and disaggregated serving

Model–System Co-Design Collaborate with model and framework teams to align LLM architectures with hardware-aware optimizations, improving real-world inference efficiency

Compiler & Runtime Optimization Leverage compiler technologies (LLVM, ROCm, Triton, graph compilers) to improve kernel fusion, memory access patterns, and end-to-end inference pipelines

End-to-End Inference Pipeline Optimization Optimize the full inference stack—from model execution graphs and runtimes to scheduling, batching, and deployment

Open-Source Leadership Engage with open-source maintainers to upstream optimizations, influence roadmap direction, and ensure long-term sustainability of contributions

Engineering Excellence Apply best practices in software engineering, including performance benchmarking, testing, debugging, and maintainability at scale

Qualification

LLM inference frameworksGPU kernel developmentDistributed inference systemsCompiler technologiesPythonC++High-performance computingOpen-source contributionsPerformance analysisDebugging skillsCollaboration

Required

Deep LLM domain knowledge

Strong understanding of end-to-end inference systems

Ability to reason about attention, KV cache, batching, parallelism strategies, and how they map to GPU kernels and hardware characteristics

Ability to thrive in ambiguous problem spaces

Ability to independently define technical direction

Ability to consistently deliver measurable performance gains

Strong execution with thoughtful upstream collaboration

High bar for software quality

Optimize LLM Inference Frameworks

LLM-Aware Kernel Development

Distributed LLM Inference at Scale

Model–System Co-Design

Compiler & Runtime Optimization

End-to-End Inference Pipeline Optimization

Open-Source Leadership

Apply best practices in software engineering, including performance benchmarking, testing, debugging, and maintainability at scale

Master's or PhD in Computer Science, Computer Engineering, Electrical Engineering, or a related field

Preferred

Deep understanding of Large Language Model inference, including attention mechanisms, KV cache behavior, batching strategies, and latency/throughput trade-offs

Hands-on experience with vLLM, SGLang, or similar inference systems (e.g., FasterTransformer), with demonstrated performance tuning

Proven experience optimizing GPU kernels for deep learning workloads, particularly inference-critical paths

Experience designing and tuning large-scale inference systems across multiple GPUs and nodes

Track record of meaningful upstream contributions to ML, LLM, or systems-level open-source projects

Strong proficiency in Python and C++, with deep experience in performance analysis, profiling, and debugging complex systems

Experience running and optimizing large-scale workloads on heterogeneous GPU clusters

Solid foundation in compiler concepts and tooling (LLVM, ROCm, Triton), applied to ML kernel and runtime optimization

Benefits

AMD benefits at a glance.

Company

AMD

Glassdoor4.1

Advanced Micro Devices is a semiconductor company that designs and develops graphics units, processors, and media solutions.

Founded in 1969

Santa Clara, California, USA

10001+ employees

http://www.amd.com

H1B Sponsorship

AMD has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (836)

2024 (770)

2023 (551)

2022 (739)

2021 (519)

2020 (547)

Funding

Current Stage

Public Company

Total Funding

unknown

Key Investors

OpenAIDaniel Loeb

2025-10-06Post Ipo Equity

2023-03-02Post Ipo Equity

2021-06-29Post Ipo Equity

Leadership Team

Lisa Su

Chair & CEO

Mark Papermaster

CTO and EVP

Recent News

CRN

AMD CEO On ‘Next Phase Of AI’ Investments In EPYC, Ryzen, GPUs And Partners In 2026

2026-01-23

PCMag.com - Technology Product Reviews, News, Prices & Tips

AMD Ryzen 7 9850X3D Launches on Jan. 29 for $499

2026-01-23

KitGuru.net

AMD confirms Ryzen 7 9850X3D for late January release, $499

2026-01-23

Company data provided by crunchbase

AMD · 4 hours ago

Senior Software Development Engineer - LLM Kernel & Inference Systems

Responsibilities

Qualification

Required

Preferred

Benefits

Company

AMD

H1B Sponsorship

Funding

Leadership Team

Recent News

Senior Software Development Engineer - LLM Kernel & Inference Systems