AMD · 7 hours ago
Post-Training Platform Infrastructure Engineer
AMD is a company dedicated to building innovative products that enhance next-generation computing experiences. They are seeking a systems-minded engineer to focus on post-training and inference infrastructure, emphasizing performance optimization and efficient resource utilization in large-scale model inference and reinforcement learning systems.
Embedded SoftwareArtificial Intelligence (AI)SemiconductorCloud ComputingElectronicsHardwareAI InfrastructureComputerEmbedded SystemsGPU
Responsibilities
Research and deeply understand modern LLM inference frameworks, including:
Architecture and design tradeoffs of P/D (prefill / decode) disaggregation
KV cache lifecycle, memory layout, eviction strategies, and reuse
KV cache offloading mechanisms across GPU, CPU, and storage backends
Analyze and compare inference execution paths to identify:
Performance bottlenecks (latency, throughput, memory pressure)
Inefficiencies in scheduling, cache management, and resource utilization
Develop and implement infrastructure-level features to:
Improve inference latency, throughput, and memory efficiency
Optimize KV cache management and offloading strategies
Enhance scalability across multi-GPU and multi-node deployments
Apply the same research-driven approach to RL frameworks:
Study post-training and RL systems (e.g., policy rollout, inference-heavy loops)
Debug performance and correctness issues in distributed RL pipelines
Optimize inference, rollout efficiency, and memory usage during training
Collaborate with research and applied ML teams to:
Translate model-level requirements into infrastructure capabilities
Validate performance gains with benchmarks and real workloads
Document findings, architectural insights, and best practices to guide future system design
Qualification
Required
Strong background in systems engineering, distributed systems, or ML infrastructure
Hands-on experience with GPU-accelerated workloads and memory-constrained systems
Solid understanding of LLM inference workflows (prefill vs decode)
Attention mechanisms and KV cache behavior
Multi-process / multi-GPU execution models
Proficiency in Python and C++ (or similar systems languages)
Experience debugging performance issues using profiling tools (GPU, CPU, memory)
Ability to read, understand, and modify complex open-source codebases
Strong analytical skills and comfort working in research-heavy, ambiguous problem spaces
Direct experience with LLM inference frameworks or serving stacks
Familiarity with GPU memory hierarchies (HBM, pinned memory, NUMA considerations)
KV cache compression, paging, or eviction strategies
Storage-backed offloading (NVMe, object stores, distributed file system)
Experience with distributed RL or post-training pipelines
Knowledge of scheduling systems, async execution, or actor-based runtimes
Contributions to open-source ML or systems projects
Experience designing benchmarking suites or performance evaluation frameworks
Bachelor's or master's degree in computer science, computer engineering, electrical engineering, or equivalent
Benefits
AMD benefits at a glance.
Company
AMD
Advanced Micro Devices is a semiconductor company that designs and develops graphics units, processors, and media solutions.
H1B Sponsorship
AMD has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (836)
2024 (770)
2023 (551)
2022 (739)
2021 (519)
2020 (547)
Funding
Current Stage
Public CompanyTotal Funding
unknownKey Investors
OpenAIDaniel Loeb
2025-10-06Post Ipo Equity
2023-03-02Post Ipo Equity
2021-06-29Post Ipo Equity
Recent News
2026-02-06
The Next Platform
2026-02-06
2026-02-06
Company data provided by crunchbase