AMD · 1 day ago
AI Validation Engineer
AMD is a company focused on building innovative products that enhance computing experiences. They are seeking an AI Validation Engineer to develop and execute methodologies for system-level validation on machine learning systems, collaborating with various teams to improve quality and automation processes.
AI InfrastructureArtificial Intelligence (AI)Cloud ComputingComputerEmbedded SystemsGPUHardwareSemiconductor
Responsibilities
Lead the definition of the test procedures to validate the design of various system components on the next generation of machine learning architectures. This includes all components of servers including CPU/GPU/Memory/BIOS/BMC/IO/storage/networking, etc
Lead efforts to validate Scale-up and Scale-out architectures, including definition of test plans at cluster-level as well as writing code to support the validation
Translating system specs into a robust system integration test plan
Develop complex/critical test content as well as monitoring/debug/root-cause SW mechanisms that others can use in their plans
Establish methods to validate, monitor and root-cause errors at cluster-level
Make improvements to system level integration test strategies, methodologies, and processes
Collaboration with customer and multi-functional HW and SW teams to debug and tackle complex issues
Investigate, profile and enable test content for a wide variety of system domains, as well as benchmarks and proxies for customer workloads into our own test frameworks. These range from industry standard benchmarks to state-of-the-art training and inference applications
Develop and improve automation features according to requirements
Monitor and analyze the execution of automated tests at scale (for hundreds or thousands of systems)
Qualification
Required
Bachelor's or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent
Develop and execute methodologies and test content for system-level hardware, firmware and software validation on machine learning systems built with AMD's technologies
Work closely with architecture, design, and post-silicon teams to identify and resolve complex systems
Develop automation, refine validation processes, execute test cases, analyze data and drive system-level quality across AMD's product portfolio
Lead the definition of the test procedures to validate the design of various system components on the next generation of machine learning architectures
Lead efforts to validate Scale-up and Scale-out architectures, including definition of test plans at cluster-level as well as writing code to support the validation
Translating system specs into a robust system integration test plan
Develop complex/critical test content as well as monitoring/debug/root-cause SW mechanisms that others can use in their plans
Establish methods to validate, monitor and root-cause errors at cluster-level
Make improvements to system level integration test strategies, methodologies, and processes
Collaboration with customer and multi-functional HW and SW teams to debug and tackle complex issues
Investigate, profile and enable test content for a wide variety of system domains, as well as benchmarks and proxies for customer workloads into our own test frameworks
Develop and improve automation features according to requirements
Monitor and analyze the execution of automated tests at scale (for hundreds or thousands of systems)
Preferred
Prior experience working on HPC or Machine Learning HW systems for large data centers
Several years of experience writing software in languages such as Python and/or C/C++
Post-silicon system integration, system testing
Debugging skills at SoC (System on a Chip), System level and cluster level
Experience with Computer Architecture concepts and silicon features, particularly on machine learning systems
Computer enthusiasts and excellent knowledge of current machine learning technologies in the data center
Effective communication skills including influencing and working across large multi-functional HW, SW, architecture teams
Benefits
AMD benefits at a glance.
Company
AMD
Advanced Micro Devices is a semiconductor company that designs and develops graphics units, processors, and media solutions.
Funding
Current Stage
Public CompanyTotal Funding
unknownKey Investors
OpenAIDaniel Loeb
2025-10-06Post Ipo Equity
2023-03-02Post Ipo Equity
2021-06-29Post Ipo Equity
Recent News
2026-01-16
2026-01-16
2026-01-16
Company data provided by crunchbase