Distinguished Software Architect - Deep Learning and HPC Communications jobs in United States
cer-icon
Apply on Employer Site
company-logo

NVIDIA · 1 week ago

Distinguished Software Architect - Deep Learning and HPC Communications

NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High Performance Computing and Visualization. They are seeking a Distinguished Software Architect to co-design next generation data center platforms, focusing on communication libraries for Deep Learning and HPC applications.

AI InfrastructureArtificial Intelligence (AI)Consumer ElectronicsFoundational AIGPUHardwareSoftwareVirtual Reality
check
Growth Opportunities
check
H1B Sponsor Likelynote

Responsibilities

Research new communication technologies (e.g. expand the GPUDirect technology portfolio) and design new features for our communication libraries
Propose innovative solutions in HW and SW for our next-gen platforms. You will co-design these solutions with the GPU, Networking, and SW architects and ensure seamless integration with the software stacks
Inspire changes based on quantitative data coming from proof-of-concepts or detailed technical analysis/modeling
Drive the adoption of new communication technologies across application verticals
Keep up with the latest DL research and collaborate with diverse teams (internal and external), including DL researchers, and customers

Qualification

HPCParallel programming modelsCommunication runtimeGPU architectureHigh performance networkingML/DL fundamentalsCC++ programmingFlexibility in communication

Required

PHD in Computer Science, Computer Engineering or related field or strong equivalent experience; 15+ years of relevant experience in academia or the industry
Expert in following areas: HPC, parallel programming models (MPI, SHMEM), at least one communication runtime (MPI, NCCL, NVSHMEM, OpenSHMEM, UCX, UCC), computer and system architecture, GPU architecture and CUDA
Deep understanding of various aspects of high performance networking from prior work experience: network technologies (Infiniband, Ethernet), network design, network topologies, network debug and performance analysis
Strong in at least a few of these areas: ML/DL fundamentals and how they tie to communications, parallel algorithms, fault tolerance and resiliency, competitive assessments, performance analysis and optimizations for parallel applications on large clusters, developing applications using DL Frameworks (PyTorch, TensorFlow)
Programming fluency with C or C++ for systems software development
Flexibility to work and communicate effectively across different HW/SW teams and timezones

Preferred

Industry recognized leader in HPC/DL communications with history of patents, publications and conference talks and keynotes in areas relevant to this role
Influential role in industry standards (e.g. MPI, OpenSHMEM) and open source software (e.g. PyTorch, UCX, Open MPI)

Benefits

Equity
Benefits

Company

NVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI.

H1B Sponsorship

NVIDIA has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1877)
2024 (1355)
2023 (976)
2022 (835)
2021 (601)
2020 (529)

Funding

Current Stage
Public Company
Total Funding
$4.09B
Key Investors
ARPA-EARK Investment ManagementSoftBank Vision Fund
2023-05-09Grant· $5M
2022-08-09Post Ipo Equity· $65M
2021-02-18Post Ipo Equity

Leadership Team

leader-logo
Jensen Huang
Founder and CEO
linkedin
leader-logo
Michael Kagan
Chief Technology Officer
linkedin
Company data provided by crunchbase