Senior Software Development Engineer - LLM Kernel & Inference Systems jobs in United States
cer-icon
Apply on Employer Site
company-logo

AMD · 5 hours ago

Senior Software Development Engineer - LLM Kernel & Inference Systems

AMD is a leading company in building products that accelerate next-generation computing experiences, including AI and data centers. They are seeking a Senior Member of Technical Staff to lead in Large Language Model (LLM) inference and kernel optimization for AMD GPUs, focusing on optimizing GPU kernels and inference runtimes.

AI InfrastructureArtificial Intelligence (AI)Cloud ComputingComputerEmbedded SystemsGPUHardwareSemiconductor
check
Growth Opportunities
check
H1B Sponsor Likelynote

Responsibilities

Optimize LLM Inference Frameworks Drive performance improvements in LLM inference frameworks such as vLLM, SGLang, and PyTorch for AMD GPUs, contributing both internally and upstream
LLM-Aware Kernel Development Design and optimize GPU kernels critical to LLM inference, including attention, GEMMs, KV cache operations, MoE components, and memory-bound kernels
Distributed LLM Inference at Scale Design, implement, and tune multi-GPU and multi-node inference strategies, including TP / PP / EP hybrids, continuous batching, KV cache management, and disaggregated serving
Model–System Co-Design Collaborate with model and framework teams to align LLM architectures with hardware-aware optimizations, improving real-world inference efficiency
Compiler & Runtime Optimization Leverage compiler technologies (LLVM, ROCm, Triton, graph compilers) to improve kernel fusion, memory access patterns, and end-to-end inference pipelines
End-to-End Inference Pipeline Optimization Optimize the full inference stack—from model execution graphs and runtimes to scheduling, batching, and deployment
Open-Source Leadership Engage with open-source maintainers to upstream optimizations, influence roadmap direction, and ensure long-term sustainability of contributions
Engineering Excellence Apply best practices in software engineering, including performance benchmarking, testing, debugging, and maintainability at scale

Qualification

LLM inference frameworksGPU kernel developmentDistributed inference systemsCompiler technologiesPythonC++High-performance computingOpen-source contributionsPerformance analysisDebugging skillsCollaboration

Required

Deep LLM domain knowledge
Strong understanding of end-to-end inference systems
Ability to reason about attention, KV cache, batching, parallelism strategies, and how they map to GPU kernels and hardware characteristics
Ability to thrive in ambiguous problem spaces
Ability to independently define technical direction
Ability to consistently deliver measurable performance gains
Strong execution with thoughtful upstream collaboration
High bar for software quality
Optimize LLM Inference Frameworks
LLM-Aware Kernel Development
Distributed LLM Inference at Scale
Model–System Co-Design
Compiler & Runtime Optimization
End-to-End Inference Pipeline Optimization
Open-Source Leadership
Apply best practices in software engineering, including performance benchmarking, testing, debugging, and maintainability at scale
Master's or PhD in Computer Science, Computer Engineering, Electrical Engineering, or a related field

Preferred

Deep understanding of Large Language Model inference, including attention mechanisms, KV cache behavior, batching strategies, and latency/throughput trade-offs
Hands-on experience with vLLM, SGLang, or similar inference systems (e.g., FasterTransformer), with demonstrated performance tuning
Proven experience optimizing GPU kernels for deep learning workloads, particularly inference-critical paths
Experience designing and tuning large-scale inference systems across multiple GPUs and nodes
Track record of meaningful upstream contributions to ML, LLM, or systems-level open-source projects
Strong proficiency in Python and C++, with deep experience in performance analysis, profiling, and debugging complex systems
Experience running and optimizing large-scale workloads on heterogeneous GPU clusters
Solid foundation in compiler concepts and tooling (LLVM, ROCm, Triton), applied to ML kernel and runtime optimization

Benefits

AMD benefits at a glance.

Company

Advanced Micro Devices is a semiconductor company that designs and develops graphics units, processors, and media solutions.

H1B Sponsorship

AMD has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (836)
2024 (770)
2023 (551)
2022 (739)
2021 (519)
2020 (547)

Funding

Current Stage
Public Company
Total Funding
unknown
Key Investors
OpenAIDaniel Loeb
2025-10-06Post Ipo Equity
2023-03-02Post Ipo Equity
2021-06-29Post Ipo Equity

Leadership Team

leader-logo
Lisa Su
Chair & CEO
linkedin
leader-logo
Mark Papermaster
CTO and EVP
linkedin
Company data provided by crunchbase