Senior GPU Kubernetes Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

AMD · 12 hours ago

Senior GPU Kubernetes Engineer

AMD is a company focused on building products that accelerate next-generation computing experiences, from AI to gaming. They are seeking a Senior GPU Kubernetes Engineer to lead GPU operator development and optimize AI workloads in various environments.

AI InfrastructureArtificial Intelligence (AI)Cloud ComputingComputerEmbedded SystemsGPUHardwareSemiconductor
check
Growth Opportunities
badNo H1Bnote

Responsibilities

Lead GPU Operator development; implement topology-aware scheduling policies; optimize NUMA placement, PCIe locality, and memory bandwidth; and ensure robust integration with AMD’s ROCm drivers and runtimes
Design autoscaling logic for GPU-heavy inference and fine-tuning workloads, build monitoring and telemetry instrumentation, strengthen workload reliability, and develop scalable Helm charts and automation workflows
Collaborate closely with ROCm, platform, performance, and model teams to ensure end-to-end integration quality; troubleshoot across GPU runtimes, Kubernetes layers, and AI frameworks; influence AMD’s Kubernetes roadmap; and support deployment models across customer, partner, and ecosystem environments

Qualification

Kubernetes expertiseGPU resource managementAI workload optimizationHelm proficiencyOperator/CRD developmentNUMA understandingTechnical executionCollaborationProblem-solving

Required

Strong Kubernetes engineering expertise
Deep understanding of GPU resource management
Hands-on experience optimizing AI workloads in cloud and on-prem environments
Proven track record in problem-solving, collaboration, and technical execution

Preferred

Strong hands-on experience with Kubernetes GPU workloads, Operator/CRD development, scheduling plugins, and resource managers
Proficiency with Helm, Kustomize, Prometheus, Grafana, FluentD/FluentBit, and ArgoCD is valuable
Deep understanding of NUMA, GPU topology, affinity/anti-affinity rules, and multi-GPU inference strategies is essential
Familiarity with distributed inference frameworks such as vLLM, Triton, KServe, or Ray, along with experience deploying LLM workloads, is highly desirable
Knowledge of ROCm, AMD MI300/MI325 platforms, OpenShift, KubeVirt, or enterprise Kubernetes systems provides a strong advantage

Benefits

AMD benefits at a glance.

Company

Advanced Micro Devices is a semiconductor company that designs and develops graphics units, processors, and media solutions.

Funding

Current Stage
Public Company
Total Funding
unknown
Key Investors
OpenAIDaniel Loeb
2025-10-06Post Ipo Equity
2023-03-02Post Ipo Equity
2021-06-29Post Ipo Equity

Leadership Team

leader-logo
Lisa Su
Chair & CEO
linkedin
leader-logo
Mark Papermaster
CTO and EVP
linkedin
Company data provided by crunchbase