Senior Deep Learning Framework Communications Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

NVIDIA · 19 hours ago

Senior Deep Learning Framework Communications Engineer

NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High Performance Computing and Visualization. They are seeking a motivated Deep Learning engineer to integrate advanced communication technologies into AI stacks and improve communication performance between GPUs for AI applications.

AI InfrastructureArtificial Intelligence (AI)Consumer ElectronicsFoundational AIGPUHardwareSoftwareVirtual Reality
check
Growth Opportunities
check
H1B Sponsor Likelynote
Hiring Manager
Bella Yanovsky
linkedin

Responsibilities

Integrate new communication libraries features in AI frameworks: from PoC to performance analysis to production
Perform deep analysis of AI workloads and frameworks to identify multi-GPU communication requirements and opportunities. Collaborate hands-on with teams working on the latest AI models
Improve AI compilers to hide communications or perform automatic fusion
Conduct in-depth AI workload performance characterization on multi-GPU clusters
Design fault-tolerant and elastic solutions for large-scale or dynamic AI workloads
Author custom communication or fused compute-communication kernels to showcase ultimate performance on NV platforms
Influence the roadmap of communication libraries - NCCL & NVSHMEM
Collaborate with a very dynamic team across multiple time zones

Qualification

Deep Learning FrameworksPerformance BenchmarkingParallel ProgrammingPythonC++CUDAAI ModelsHPC Communication ConceptsAdaptabilityFlexibility

Required

B.S, M.S. or PHD in Computer Science, or related field (or equivalent experience) with 5+ software engineering and HPC/AI experience
Development or integration experience with Deep Learning Frameworks such PyTorch, JAX, and Inference Engines such as TRT-LLM, vLLM, SGLang
Rapid prototyping and development with Python, C++, CUDA or related DSLs (Triton, cuTe)
Solid grasp of AI models, parallelisms, and/or compiler technologies (e.g. torch.compile)
Experience conducting performance benchmarking on AI clusters. Familiarity with at least one performance profiler toolchain (PyTorch profiler, NVIDIA Nsight Systems)
Understanding of HPC/AI communication concepts (1-sided v 2-sided communication, elasticity, resiliency, topology discovery, etc)
Adaptability and passion to learn new areas and tools
Flexibility to work and communicate effectively across different teams and timezones

Preferred

Experience with parallel programming on at least one communication runtime (NCCL, NVSHMEM, MPI). Good understanding of computer system architecture, HW-SW interactions and operating systems principles (aka systems software fundamentals)
Expertise in one or more of these areas: Training, Distributed inference, MoE, Reinforcement Learning, kernel authoring (on CUDA, Triton, cuTe, etc). Experience with programming for compute & communication overlap in distributed runtimes
Experience with AI compiler pattern matching and lowering. Solid understanding of memory hierarchy, consistency model, and tensor layout

Benefits

Equity
Benefits

Company

NVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI.

H1B Sponsorship

NVIDIA has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1877)
2024 (1355)
2023 (976)
2022 (835)
2021 (601)
2020 (529)

Funding

Current Stage
Public Company
Total Funding
$4.09B
Key Investors
ARPA-EARK Investment ManagementSoftBank Vision Fund
2023-05-09Grant· $5M
2022-08-09Post Ipo Equity· $65M
2021-02-18Post Ipo Equity

Leadership Team

leader-logo
Jensen Huang
Founder and CEO
linkedin
leader-logo
Michael Kagan
Chief Technology Officer
linkedin
Company data provided by crunchbase