Senior Systems Design Engineer – AI/HPC jobs in United States
cer-icon
Apply on Employer Site
company-logo

AMD · 2 hours ago

Senior Systems Design Engineer – AI/HPC

AMD is a company focused on building products that accelerate next-generation computing experiences. They are seeking a Senior Member of Technical Staff Systems Design Engineer to drive complex system level debugs and validation for AI and HPC workloads.

AI InfrastructureArtificial Intelligence (AI)Cloud ComputingComputerEmbedded SystemsGPUHardwareSemiconductor
check
Growth Opportunities
badNo H1Bnote
Hiring Manager
ELIA VALLADARES- SHRM CP
linkedin

Responsibilities

Develop a deep understanding of running AI/HPC workloads in single node and cluster level and develop test suites and performance automation
Lead the debug and triage of issues found during the validation and production phases of our AI and HPC systems
Apply learnings automation and perf tests toward developing stronger test coverage and enable strategies to accelerate issue debug and root cause
Work with multiple teams to develop and execute robust validation test plans at the functional, benchmark levels that meet our customer workloads and requirements
Contribute to technical innovation to improve AMD’s capabilities across validation, including tool and script development, technical and procedural methodology enhancement, and various internal and cross-functional technical initiatives
Engage with our customers and partners on co-validation plans and priorities as well as critical issue debugs

Qualification

AI/HPC workloadsSystem level validationDebugging methodologiesC/C++PythonKubernetesDockerSlurmOS internalsAnalytical skillsProblem-solving skillsAttention to detailSelf-starter

Required

Extensive systems understanding to drive complex system level debugs to resolution
Systems design and validation engineering expertise leveraged towards product development, validation, and root cause resolution
Expert system level engineer with understanding of system level validation for AI/HPC workloads
Complex system level issue debug and methodologies
Develop a deep understanding of running AI/HPC workloads in single node and cluster level
Develop test suites and performance automation
Lead the debug and triage of issues found during the validation and production phases of AI and HPC systems
Apply learnings automation and perf tests toward developing stronger test coverage
Enable strategies to accelerate issue debug and root cause
Work with multiple teams to develop and execute robust validation test plans
Contribute to technical innovation to improve AMD's capabilities across validation
Engage with customers and partners on co-validation plans and priorities

Preferred

Programming/scripting skills (e.g. C/C++, Python)
Good understanding of AI and HPC workloads, industry standard benchmarks and frameworks for automation and validation
Experience in orchestration technologies such as Kubernetes, docker, slurm
Ability to analyse and debug issues with workloads
Extensive experience in OS internals, system architecture, technical debug, and validation strategy
Strong analytical/problem-solving skills and pronounced attention to details
Must be a self-starter, and able to independently drive tasks to completion

Benefits

AMD benefits at a glance.

Company

Advanced Micro Devices is a semiconductor company that designs and develops graphics units, processors, and media solutions.

Funding

Current Stage
Public Company
Total Funding
unknown
Key Investors
OpenAIDaniel Loeb
2025-10-06Post Ipo Equity
2023-03-02Post Ipo Equity
2021-06-29Post Ipo Equity

Leadership Team

leader-logo
Lisa Su
Chair & CEO
linkedin
leader-logo
Mark Papermaster
CTO and EVP
linkedin
Company data provided by crunchbase