Senior Hardware Engineer, GPU & PCIe jobs in United States
cer-icon
Apply on Employer Site
company-logo

CoreWeave · 1 day ago

Senior Hardware Engineer, GPU & PCIe

CoreWeave is The Essential Cloud for AI™, delivering a platform of technology and tools that enable innovators to build and scale AI with confidence. The Senior Hardware Engineer will focus on GPU and PCIe troubleshooting, playing a crucial role in the design, development, and optimization of server hardware infrastructure while collaborating with cross-functional teams and external vendors.

Artificial Intelligence (AI)Cloud ComputingCloud InfrastructureInformation TechnologyMachine Learning
badNo H1BnoteU.S. Citizen Onlynote

Responsibilities

Troubleshoot complex GPU and PCIe related failures
Partner with external vendors on failure analysis
Track component RMAs
Develop and maintain hardware/firmware management services
Automate all aspects of the server hardware lifecycle
Serve as the senior point of contact for hardware escalation and troubleshooting
Collaborate with cross-functional teams to define hardware requirements, specifications, system architecture and issue identification and resolution playbooks
Create and maintain accurate documentation of hardware designs, specifications, test procedures, and results
Analyze and optimize the performance of hardware systems, identify bottlenecks, and propose improvements for enhanced efficiency
Establish processes for internal hardware testing, deployment, performance optimization and troubleshooting

Qualification

GPU troubleshootingPCIe troubleshootingAnsible/PythonData center GPU diagnosticsServer hardware knowledgeAnalytical skillsCollaboration skillsDocumentation skillsProblem-solving skills

Required

5+ years of prior experience supporting and troubleshooting data center class GPUs (H100 or newer, including Infiniband and NVLink)
Proficiency in ansible/python and experience with programmatically interacting with server BMCs, using IPMI or Redfish (preferably Redfish)
Experience using, integrating and automating data center class GPU diagnostics and troubleshooting tools, including observability platforms like prometheus and grafana
In-depth knowledge of server hardware, components, and management technologies, particularly GPUs and PCIe devices
Proven ability to stay updated with the latest industry technologies and trends
Previous experience collaborating with hardware vendors to identify novel issues, generate operational playbooks, create alerts and drive issue resolution to completion
Strong passion for automation, with a commitment to automating processes comprehensively
Excellent documentation skills and attention to detail
Strong analytical and problem-solving abilities

Benefits

Medical, dental, and vision insurance - 100% paid for by CoreWeave
Company-paid Life Insurance
Voluntary supplemental life insurance
Short and long-term disability insurance
Flexible Spending Account
Health Savings Account
Tuition Reimbursement
Ability to Participate in Employee Stock Purchase Program (ESPP)
Mental Wellness Benefits through Spring Health
Family-Forming support provided by Carrot
Paid Parental Leave
Flexible, full-service childcare support with Kinside
401(k) with a generous employer match
Flexible PTO
Catered lunch each day in our office and data center locations
A casual work environment
A work culture focused on innovative disruption

Company

CoreWeave

twittertwittertwitter
company-logo
CoreWeave is a cloud-based AI infrastructure company offering GPU cloud services to simplify AI and machine learning workloads.

Funding

Current Stage
Public Company
Total Funding
$23.37B
Key Investors
Jane Street CapitalStack CapitalCoatue
2025-12-08Post Ipo Debt· $2.54B
2025-11-12Post Ipo Debt· $1B
2025-08-20Post Ipo Secondary

Leadership Team

leader-logo
Michael Intrator
Chief Executive Officer
linkedin
leader-logo
Nitin Agrawal
Chief Financial Officer
linkedin
Company data provided by crunchbase