Senior Performance and Development Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

NVIDIA · 2 days ago

Senior Performance and Development Engineer

NVIDIA is a leading company in AI research, and they are seeking a Senior Performance and Development Engineer to optimize efficiency and resiliency of ML workloads while developing scalable AI infrastructure tools. The role involves building AI models, developing automation frameworks, and collaborating with software teams to enhance performance in production environments.

Artificial Intelligence (AI)Consumer ElectronicsGPUHardwareSoftwareVirtual Reality
check
Growth Opportunities
check
H1B Sponsor Likelynote

Responsibilities

Build AI models, tools and frameworks that provide real time application performance metrics that can be correlated with system metrics
Develop automation frameworks that empower applications to thoughtfully predict and overcome system/infrastructure failures, ensuring fault tolerance
Collaborate with software teams to pinpoint performance bottlenecks. Design, prototype, and integrate solutions that deliver demonstrable performance gains in production environments
Adapt and enhance communication libraries to seamlessly support innovative network topologies and system architectures
Design or adapt optimized storage solutions to boost Deep Learning efficiency, resilience, and developer productivity

Qualification

PyTorchDistributed systemsFault tolerancePerformance analysisCUDACommunication librariesParallel programmingCommunication

Required

BS/MS/PhD (or equivalent experience) in Computer Science, Electrical Engineering or a related field
12+ years of proven experience in the following area:
Analyzing and improving performance of training applications using PyTorch or similar framework
Building distributed software applications using collective communication libraries such as MPI or NCCL or UCC
Construct storage solutions for Deep Learning applications
Building automated fault tolerant distributed applications
Building tools for bottleneck analysis and automation of fault tolerance in distributed environments
Strong background in parallel programming and distributed systems
Experience analyzing and optimizing large scale distributed applications
Excellent verbal and written communication skills

Preferred

Deep understanding of HPC and distributed system architecture
Hands on working experience in more than one of the above areas especially with large SOTA AI models, performance analysis and profiling of Deep Learning workloads
Comfortable navigating and working with the PyTorch codebase
Proven understanding of CUDA and GPU architecture

Benefits

Equity
Benefits

Company

NVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI.

H1B Sponsorship

NVIDIA has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1877)
2024 (1355)
2023 (976)
2022 (835)
2021 (601)
2020 (529)

Funding

Current Stage
Public Company
Total Funding
$4.09B
Key Investors
ARPA-EARK Investment ManagementSoftBank Vision Fund
2023-05-09Grant· $5M
2022-08-09Post Ipo Equity· $65M
2021-02-18Post Ipo Equity

Leadership Team

leader-logo
Jensen Huang
Founder and CEO
linkedin
leader-logo
Michael Kagan
Chief Technology Officer
linkedin
Company data provided by crunchbase