Apply on Employer Site

NVIDIA · 5 months ago

Senior Software Engineer, AI Inference

United States

Full-time

Remote

Senior Level

$184K/yr - $288K/yr

6+ years exp

NVIDIA is a leader in AI computing and has been transforming computer graphics and accelerated computing for over 25 years. They are seeking a Senior Software Engineer to work on distributed model management systems for AI inference workloads, collaborating with engineers and researchers to develop scalable APIs and services.

AI InfrastructureArtificial Intelligence (AI)Consumer ElectronicsFoundational AIGPUHardwareSoftwareVirtual Reality

Growth Opportunities

H1B Sponsor Likely

Responsibilities

Build and maintain distributed model management systems, including Rust-based runtime components, for large-scale AI inference workloads

Implement inference scheduling and deployment solutions on Kubernetes and Slurm, while driving advances in scaling, orchestration, and resource management

Collaborate with infrastructure engineers and researchers to develop scalable APIs, services, and end-to-end inference workflows

Create monitoring, benchmarking, automation, and documentation processes to ensure low-latency, robust, and production-ready inference systems on GPU clusters

Qualification

Deep LearningDistributed SystemsGPU ProgrammingKubernetesRustPyTorchCollaborationProblem SolvingDocumentation

Required

Bachelor's, Master's, or PhD in Computer Science, ECE, or related field (or equivalent experience)

6+ years of professional software engineering experience

Strong understanding of modern ML architectures with a keen intuition for optimizing inference performance

Take full ownership of problems end-to-end, proactively acquiring any knowledge or skills needed to deliver results

Familiar with or able to quickly gain expertise in vLLM, SGLang, PyTorch, NVIDIA GPUs, and supporting software stacks such as NIXL, NCCL, CUDA, as well as HPC technologies like InfiniBand, MPI, and NVLink

Experienced in architecting, building, monitoring, and debugging production-grade distributed systems; bonus if you've worked on performance-critical ones

Preferred

Experience with inference-serving frameworks (e.g., Dynamo Inference Server, TensorRT, ONNX Runtime) and deploying/managing LLM inference pipelines at scale

Contributions to large-scale, low-latency distributed systems (open-source preferred) with proven expertise in high-availability infrastructure

Strong background in GPU inference performance tuning, CUDA-based systems, and operating across cloud-native and hybrid environments (AWS, GCP, Azure)

Benefits

Equity

Benefits

Company

NVIDIA

Glassdoor4.6

NVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI.

Founded in 1993

Santa Clara, California, USA

10001+ employees

https://www.nvidia.com

H1B Sponsorship

NVIDIA has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (1877)

2024 (1355)

2023 (976)

2022 (835)

2021 (601)

2020 (529)

Funding

Current Stage

Public Company

Total Funding

$4.09B

Key Investors

ARPA-EARK Investment ManagementSoftBank Vision Fund

2023-05-09Grant· $5M

2022-08-09Post Ipo Equity· $65M

2021-02-18Post Ipo Equity

Leadership Team

Jensen Huang

Founder and CEO

Michael Kagan

Chief Technology Officer

Recent News

The Motley Fool

Should You Buy Nvidia Stock Before Feb. 25? Here's What History Says.

2026-02-09

The Motley Fool

2 Top Artificial Intelligence (AI) Stocks to Buy Right Now

2026-02-09

Livemint.com

Indian stock market: 8 key things that changed for market over weekend - Gift Nifty, Nikkei rally to India-US trade deal

2026-02-09

Company data provided by crunchbase