Senior Deep Learning Communication Architect jobs in United States
cer-icon
Apply on Employer Site
company-logo

NVIDIA · 1 week ago

Senior Deep Learning Communication Architect

NVIDIA is a leader in computer graphics and AI innovation, seeking a Senior Deep Learning Communication Architect to enhance the performance and scalability of deep learning systems. The role involves optimizing communication protocols, collaborating with hardware and software teams, and exploring innovative technologies to improve distributed deep learning training and inference.

AI InfrastructureArtificial Intelligence (AI)Consumer ElectronicsFoundational AIGPUHardwareSoftwareVirtual Reality
check
Growth Opportunities
check
H1B Sponsor Likelynote

Responsibilities

The software architecture group at NVIDIA has openings for a Deep Learning Communication Architect. We scale the DNN models and training/inference frameworks to systems with hundreds of thousands of nodes
Optimizing communication performance: Identify and eliminate bottlenecks in data transfer and synchronization during distributed deep learning training and inference
Designing efficient communication protocols: Develop and implement communication algorithms and protocols tailored for deep learning workloads, minimizing communication overhead and latency
Hardware and software co-craft: Collaborate with hardware and software teams to craft systems that effectively apply high-speed interconnects (e.g., NVLink, InfiniBand, SPC-X) and communication libraries (e.g., MPI, NCCL, UCX, UCC, NVSHMEM)
Exploring innovative communication technologies: Research and evaluate new communication technologies and techniques to enhance the performance and scalability of deep learning systems
Developing and implementing solutions: Build proofs-of-concept, conduct experiments, and perform quantitative modeling to validate and deploy new communication strategies

Qualification

Deep Learning CommunicationDNN TrainingInferenceC++ ProgrammingPython ProgrammingParallelism TechniquesCommunication ProtocolsGPU ComputingResearch SkillsTeam Collaboration

Required

A Ph.D., Masters, or BS in Computer Science (CS), Electrical Engineering (EE), Computer Science and Electrical Engineering (CSEE), or a closely related field or equivalent experience
6+ years of experience in Building DNNs, Scaling of DNNs, Parallelism of DNN frameworks, or deep learning training and inference workloads
Experience in evaluating, analyzing, and optimizing LLM training and inference performance of state-of-the-art models on cutting-edge hardware
Deep understanding of parallelism techniques, including Data Parallelism, Pipeline Parallelism, Tensor Parallelism, Expert Parallelism, and FSDP
Understanding of the emerging serving architectures like Disaggregated Serving and inference servers like Dynamo and Triton
Proficiency in developing code for one or more deep neural network (DNN) training and Inference frameworks, such as PyTorch, TensorRT-LLM, vLLM, SGLang
Strong programming skills in C++ and Python
Familiarity with GPU computing, including CUDA and OpenCL, and familiarity with InfiniBand and RoCE networks

Preferred

Prior contributions to one or more DNN training and Inference frameworks as part of your previous work experience
Deep understanding and contributions to the scaling of LLMs on large-scale systems

Benefits

Equity
Benefits

Company

NVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI.

H1B Sponsorship

NVIDIA has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1877)
2024 (1355)
2023 (976)
2022 (835)
2021 (601)
2020 (529)

Funding

Current Stage
Public Company
Total Funding
$4.09B
Key Investors
ARPA-EARK Investment ManagementSoftBank Vision Fund
2023-05-09Grant· $5M
2022-08-09Post Ipo Equity· $65M
2021-02-18Post Ipo Equity

Leadership Team

leader-logo
Jensen Huang
Founder and CEO
linkedin
leader-logo
Michael Kagan
Chief Technology Officer
linkedin
Company data provided by crunchbase