NVIDIA · 22 hours ago
Senior Deep Learning Framework Communications Engineer
NVIDIA is a leader in Artificial Intelligence, High Performance Computing, and Visualization. They are seeking a motivated Deep Learning engineer to integrate advanced communication technologies into AI stacks and improve communication performance between GPUs, which is crucial for AI applications.
AI InfrastructureArtificial Intelligence (AI)Consumer ElectronicsFoundational AIGPUHardwareSoftwareVirtual Reality
Responsibilities
Integrate new communication libraries features in AI frameworks: from PoC to performance analysis to production
Perform deep analysis of AI workloads and frameworks to identify multi-GPU communication requirements and opportunities. Collaborate hands-on with teams working on the latest AI models
Improve AI compilers to hide communications or perform automatic fusion
Conduct in-depth AI workload performance characterization on multi-GPU clusters
Design fault-tolerant and elastic solutions for large-scale or dynamic AI workloads
Author custom communication or fused compute-communication kernels to showcase ultimate performance on NV platforms
Influence the roadmap of communication libraries - NCCL & NVSHMEM
Collaborate with a very dynamic team across multiple time zones
Qualification
Required
B.S, M.S. or PHD in Computer Science, or related field (or equivalent experience) with 5+ software engineering and HPC/AI experience
Development or integration experience with Deep Learning Frameworks such PyTorch, JAX, and Inference Engines such as TRT-LLM, vLLM, SGLang
Rapid prototyping and development with Python, C++, CUDA or related DSLs (Triton, cuTe)
Solid grasp of AI models, parallelisms, and/or compiler technologies (e.g. torch.compile)
Experience conducting performance benchmarking on AI clusters. Familiarity with at least one performance profiler toolchain (PyTorch profiler, NVIDIA Nsight Systems)
Understanding of HPC/AI communication concepts (1-sided v 2-sided communication, elasticity, resiliency, topology discovery, etc)
Adaptability and passion to learn new areas and tools
Flexibility to work and communicate effectively across different teams and timezones
Preferred
Experience with parallel programming on at least one communication runtime (NCCL, NVSHMEM, MPI). Good understanding of computer system architecture, HW-SW interactions and operating systems principles (aka systems software fundamentals)
Expertise in one or more of these areas: Training, Distributed inference, MoE, Reinforcement Learning, kernel authoring (on CUDA, Triton, cuTe, etc). Experience with programming for compute & communication overlap in distributed runtimes
Experience with AI compiler pattern matching and lowering. Solid understanding of memory hierarchy, consistency model, and tensor layout
Benefits
Equity
Benefits
Company
NVIDIA
NVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI.
H1B Sponsorship
NVIDIA has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1877)
2024 (1355)
2023 (976)
2022 (835)
2021 (601)
2020 (529)
Funding
Current Stage
Public CompanyTotal Funding
$4.09BKey Investors
ARPA-EARK Investment ManagementSoftBank Vision Fund
2023-05-09Grant· $5M
2022-08-09Post Ipo Equity· $65M
2021-02-18Post Ipo Equity
Recent News
Digital Journal
2026-01-23
Company data provided by crunchbase