System Level Debug Engineer – Data Center GPU jobs in United States
cer-icon
Apply on Employer Site
company-logo

AMD · 20 hours ago

System Level Debug Engineer – Data Center GPU

AMD is a leading company in the tech industry focused on building products that enhance next-generation computing experiences. They are seeking an experienced System Level Debug Engineer for their Data Center GPU organization to lead efforts in debugging and validating complex systems, ensuring optimal performance and collaboration across various technical teams.

AI InfrastructureArtificial Intelligence (AI)Cloud ComputingComputerEmbedded SystemsGPUHardwareSemiconductor
check
Growth Opportunities
badNo H1Bnote

Responsibilities

Debug / triage engineer and understanding of industry tools for root causing complex issues
Understanding of GPU/System level HW and SW flow
Ability to probe parts of a board; check electrical and power currents and validate a system
Provide leadership for driving to root cause issues
Communicate / Document flows and methods of bring-up, boot-up, system initialization and debug
Lead technical presentations demonstrating a good understanding of application, data, infrastructure, architecture expertise and application systems design
Collaborate with application, and infrastructure architects and be responsible for the defining-designing-delivering of the technical architectures, patterns, technical quality, risks, fitness for purpose and operability of technical architecture solutions
Be a leader and mentor to the operation team; be hands-on and lead by example
Be able to hand-on troubleshoot and solve the technical issues; own the problem and drive for resolution
Able to proactively support team culture that fosters knowledge sharing, excellence, and collaboration

Qualification

System debugGPU architectureHardware debuggingC/C++ programmingAgile methodologiesScripting languagesRevision controlDatabase developmentCommunication skillsTeam leadershipProblem solving

Required

Strong technical background to contribute to all aspects of the software development process
Experience in debugging of complex HW/FW issues
Understanding the flow of a GPU through the different layers of a system
Ability to validate the items connecting to the GPU SOC (pcie, vr's, RMs, retimers, HBM, internal networking)
Excellent communication skills
Ability to drive to root closure any issues encountered
Hands-on experience with Hardware in a DataCenter environment
Ability to probe parts of a board; check electrical and power currents and validate a system
Provide leadership for driving to root cause issues
Communicate / Document flows and methods of bring-up, boot-up, system initialization and debug
Lead technical presentations demonstrating a good understanding of application, data, infrastructure, architecture expertise and application systems design
Collaborate with application, and infrastructure architects and be responsible for the defining-designing-delivering of the technical architectures, patterns, technical quality, risks, fitness for purpose and operability of technical architecture solutions
Be a leader and mentor to the operation team; be hands-on and lead by example
Be able to hand-on troubleshoot and solve the technical issues; own the problem and drive for resolution
Able to proactively support team culture that fosters knowledge sharing, excellence, and collaboration
Bachelor's/Master's degree in Computer Science or related field strongly preferred + minimum 8 yrs experience in System or SOC level debug and triage

Preferred

Significant experience in SoC and/or System debug of complex issues
Develop / Document debug capabilities on a given SOC and System
Go-to-person for debugging of issues for the Production level Platform validation
Collaborate with internal teams on root causing issues, finding optimum resolutions
Hands-on experience in using industry debug tools, scopes as well examine board level power
Proven experience with C/C++
Demonstrable experience in facilitating Agile, Scrum or Kanban
Skilled in scripting languages such as Perl, Ruby, and Shell script
Proficient with revision control (GIT, SVN and CVS)
Experience crafting and supporting cloud environments, including IaaS and PaaS
Database development, PostgreSQL, Oracle, MS SQL Server
Good balance of hardware, architecture, and software expertise
Proven ability to drive resolution of critical problems within a lab, Datacenter
Relationship with external customers/partners and able to help resolve problems in their Data Center
Relationship with external customers/partners on ability to work manufacturing issues/failures
Relationship with external customers/partners on ability to define rqmts for manufacturing validation

Benefits

AMD benefits at a glance.

Company

Advanced Micro Devices is a semiconductor company that designs and develops graphics units, processors, and media solutions.

Funding

Current Stage
Public Company
Total Funding
unknown
Key Investors
OpenAIDaniel Loeb
2025-10-06Post Ipo Equity
2023-03-02Post Ipo Equity
2021-06-29Post Ipo Equity

Leadership Team

leader-logo
Lisa Su
Chair & CEO
linkedin
leader-logo
Mark Papermaster
CTO and EVP
linkedin
Company data provided by crunchbase