Senior GPU Kubernetes Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Advanced Microdevices Pvt. Ltd. (India) · 5 hours ago

Senior GPU Kubernetes Engineer

Advanced Micro Devices, Inc is dedicated to building innovative products that enhance computing experiences across various domains. The Senior GPU Kubernetes Engineer will lead GPU operator development and optimize AI workloads, ensuring effective integration and deployment automation for the AMD Enterprise AI Suite.

BiopharmaBiotechnologyIndustrialManufacturing
badNo H1Bnote

Responsibilities

Lead GPU Operator development; implement topology-aware scheduling policies; optimize NUMA placement, PCIe locality, and memory bandwidth; and ensure robust integration with AMD’s ROCm drivers and runtimes
Design autoscaling logic for GPU-heavy inference and fine-tuning workloads, build monitoring and telemetry instrumentation, strengthen workload reliability, and develop scalable Helm charts and automation workflows
Collaborate closely with ROCm, platform, performance, and model teams to ensure end-to-end integration quality; troubleshoot across GPU runtimes, Kubernetes layers, and AI frameworks; influence AMD’s Kubernetes roadmap; and support deployment models across customer, partner, and ecosystem environments

Qualification

KubernetesGPU resource managementAI workload optimizationHelmOperator/CRD developmentNUMA understandingMulti-GPU inferenceTechnical executionCollaborationProblem-solving

Required

Strong Kubernetes engineering expertise
Deep understanding of GPU resource management
Hands-on experience optimizing AI workloads in cloud and on-prem environments
Proven track record in problem-solving, collaboration, and technical execution
Lead GPU Operator development
Implement topology-aware scheduling policies
Optimize NUMA placement, PCIe locality, and memory bandwidth
Ensure robust integration with AMD's ROCm drivers and runtimes
Design autoscaling logic for GPU-heavy inference and fine-tuning workloads
Build monitoring and telemetry instrumentation
Strengthen workload reliability
Develop scalable Helm charts and automation workflows
Collaborate closely with ROCm, platform, performance, and model teams
Troubleshoot across GPU runtimes, Kubernetes layers, and AI frameworks
Influence AMD's Kubernetes roadmap
Support deployment models across customer, partner, and ecosystem environments
BS, MS, or PhD in Computer Science or a related equivalent

Preferred

Strong hands-on experience with Kubernetes GPU workloads
Operator/CRD development
Scheduling plugins and resource managers
Proficiency with Helm, Kustomize, Prometheus, Grafana, FluentD/FluentBit, and ArgoCD
Deep understanding of NUMA, GPU topology, affinity/anti-affinity rules, and multi-GPU inference strategies
Familiarity with distributed inference frameworks such as vLLM, Triton, KServe, or Ray
Experience deploying LLM workloads
Knowledge of ROCm, AMD MI300/MI325 platforms, OpenShift, KubeVirt, or enterprise Kubernetes systems

Benefits

AMD benefits at a glance.

Company

Advanced Microdevices Pvt. Ltd. (India)

twittertwittertwitter
company-logo
Advanced Microdevices (mdi) is a leader in innovative membrane technologies.

Funding

Current Stage
Late Stage

Leadership Team

leader-logo
Nalini Kant Gupta
Founder & Managing Director
Company data provided by crunchbase