AMD · 4 hours ago
Senior Software Development Engineer - LLM Kernel & Inference Systems
AMD is a leading company in building products that accelerate next-generation computing experiences, including AI and data centers. They are seeking a Senior Member of Technical Staff to lead in Large Language Model (LLM) inference and kernel optimization for AMD GPUs, focusing on optimizing GPU kernels and inference runtimes.
AI InfrastructureArtificial Intelligence (AI)Cloud ComputingComputerEmbedded SystemsGPUHardwareSemiconductor
Responsibilities
Optimize LLM Inference Frameworks Drive performance improvements in LLM inference frameworks such as vLLM, SGLang, and PyTorch for AMD GPUs, contributing both internally and upstream
LLM-Aware Kernel Development Design and optimize GPU kernels critical to LLM inference, including attention, GEMMs, KV cache operations, MoE components, and memory-bound kernels
Distributed LLM Inference at Scale Design, implement, and tune multi-GPU and multi-node inference strategies, including TP / PP / EP hybrids, continuous batching, KV cache management, and disaggregated serving
Model–System Co-Design Collaborate with model and framework teams to align LLM architectures with hardware-aware optimizations, improving real-world inference efficiency
Compiler & Runtime Optimization Leverage compiler technologies (LLVM, ROCm, Triton, graph compilers) to improve kernel fusion, memory access patterns, and end-to-end inference pipelines
End-to-End Inference Pipeline Optimization Optimize the full inference stack—from model execution graphs and runtimes to scheduling, batching, and deployment
Open-Source Leadership Engage with open-source maintainers to upstream optimizations, influence roadmap direction, and ensure long-term sustainability of contributions
Engineering Excellence Apply best practices in software engineering, including performance benchmarking, testing, debugging, and maintainability at scale
Qualification
Required
Deep LLM domain knowledge
Strong understanding of end-to-end inference systems
Ability to reason about attention, KV cache, batching, parallelism strategies, and how they map to GPU kernels and hardware characteristics
Ability to thrive in ambiguous problem spaces
Ability to independently define technical direction
Ability to consistently deliver measurable performance gains
Strong execution with thoughtful upstream collaboration
High bar for software quality
Optimize LLM Inference Frameworks
LLM-Aware Kernel Development
Distributed LLM Inference at Scale
Model–System Co-Design
Compiler & Runtime Optimization
End-to-End Inference Pipeline Optimization
Open-Source Leadership
Apply best practices in software engineering, including performance benchmarking, testing, debugging, and maintainability at scale
Master's or PhD in Computer Science, Computer Engineering, Electrical Engineering, or a related field
Preferred
Deep understanding of Large Language Model inference, including attention mechanisms, KV cache behavior, batching strategies, and latency/throughput trade-offs
Hands-on experience with vLLM, SGLang, or similar inference systems (e.g., FasterTransformer), with demonstrated performance tuning
Proven experience optimizing GPU kernels for deep learning workloads, particularly inference-critical paths
Experience designing and tuning large-scale inference systems across multiple GPUs and nodes
Track record of meaningful upstream contributions to ML, LLM, or systems-level open-source projects
Strong proficiency in Python and C++, with deep experience in performance analysis, profiling, and debugging complex systems
Experience running and optimizing large-scale workloads on heterogeneous GPU clusters
Solid foundation in compiler concepts and tooling (LLVM, ROCm, Triton), applied to ML kernel and runtime optimization
Benefits
AMD benefits at a glance.
Company
AMD
Advanced Micro Devices is a semiconductor company that designs and develops graphics units, processors, and media solutions.
H1B Sponsorship
AMD has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (836)
2024 (770)
2023 (551)
2022 (739)
2021 (519)
2020 (547)
Funding
Current Stage
Public CompanyTotal Funding
unknownKey Investors
OpenAIDaniel Loeb
2025-10-06Post Ipo Equity
2023-03-02Post Ipo Equity
2021-06-29Post Ipo Equity
Recent News
PCMag.com - Technology Product Reviews, News, Prices & Tips
2026-01-23
2026-01-23
Company data provided by crunchbase