AMD ยท 4 hours ago
HPC Systems Staff Engineer
AMD is a company focused on building products that accelerate next-generation computing experiences. They are seeking an HPC Systems Engineer responsible for designing, developing, and administering High-Performance Computing infrastructure, GPU clusters, and AI workload schedulers.
Responsibilities
Develop, implement, and maintain GPU-based clusters, ensuring optimal performance
Administer ML/AI platforms - Distributed ML services, LLMs and AI inferencing, by managing deployments, resource allocation, monitoring, and security
Automate system provisioning and Cluster management end to end
Collaborate with cross-functional teams to address AI infrastructure requirements, support AI-related projects, and provide technical expertise
Monitor and evaluate the performance of AI systems and clusters, ensuring that they adhere to industry best practices and meet company standards
Use AI/ML to continuously improve internal processes and tools that are used in end-to-end delivery of your services in this team
Qualification
Required
Design, development, and administration of High-Performance Computing (HPC) infrastructure
Development, implementation, and maintenance of GPU-based clusters
Administration of ML/AI platforms - Distributed ML services, LLMs and AI inferencing
Automate system provisioning and Cluster management end to end
Collaborate with cross-functional teams to address AI infrastructure requirements
Monitor and evaluate the performance of AI systems and clusters
Use AI/ML to continuously improve internal processes and tools
Preferred
Experience in developing Python based AI apps and UI
HPC infrastructure engineering for AI/HPC domain
SLURM and Kubernetes management
Managing GPU clusters optimizing GPU-based services/tools/software
Experience in creating web services with HPC backend (like AI)
Proficiency in RoCEv2, K8s, KVM, Ubuntu, Python, Shell, GPU drivers, and Cluster interconnect with 400G networking
Demonstrated experience with AI workload schedulers and allocation optimization
Automation/monitoring tool - Ansible / Saltstack, Terraform, Prometheus, Grafana
Strong organizational, problem-solving, and troubleshooting skills
Excellent verbal and written communication skills
Benefits
AMD benefits at a glance.
Company
AMD
Advanced Micro Devices is a semiconductor company that designs and develops graphics units, processors, and media solutions.
Funding
Current Stage
Public CompanyTotal Funding
unknownKey Investors
OpenAIDaniel Loeb
2025-10-06Post Ipo Equity
2023-03-02Post Ipo Equity
2021-06-29Post Ipo Equity
Recent News
2026-01-13
Morningstar.com
2026-01-11
Company data provided by crunchbase