Senior Software Engineer, Cloud-Native Stack – CSP Engagements jobs in United States
cer-icon
Apply on Employer Site
company-logo

NVIDIA · 1 day ago

Senior Software Engineer, Cloud-Native Stack – CSP Engagements

NVIDIA is a leading technology company known for its advancements in Artificial Intelligence and High-Performance Computing. They are seeking a Senior Software Engineer for their CSP Engagements team to focus on the cloud-native stack for AI/ML datacenters, where the role involves defining customer workflows, prototyping stack enhancements, and debugging complex issues in multi-rack environments.

Artificial Intelligence (AI)Consumer ElectronicsGPUHardwareSoftwareVirtual Reality
check
Growth Opportunities
check
H1B Sponsor Likelynote
Hiring Manager
Bella Yanovsky
linkedin

Responsibilities

Perform deep-dive debugging of multi-rack, multi-tenant clusters: scheduler behavior, container runtime issues, device-plugin crashes, RDMA/IB fabric anomalies, etc
Gather customer requirements and prototype feature extensions for Kubernetes operators, Slurm plugins, and custom micro-services that expose new GPU capabilities
Drive joint architecture reviews and “whiteboard” sessions with CSP and internal platform teams; convert findings into RFCs and upstream pull requests
Create reproducible testbeds (Helm/Ansible/Terraform) that mirror customer environments; automate validation and benchmark suites
Deliver technical collateral-design docs, how-to guides, demo scripts-and present at customer on-sites, KubeCon, and SlurmUG
Collaborate with AE, FAE, and Solution Architect teams to deliver integrated customer solutions and technical documentation

Qualification

Kubernetes internalsSlurmGPU integrationCloud-native stacksCI/CDObservability toolsDistributed systemsCustomer-facing engineeringPrototypingCommunicationTechnical documentation

Required

Strong source-level expertise in Kubernetes internals (scheduler, CRI/CNI/CSI, operators) and Slurm (federation, power-save, plugins)
Hands-on experience integrating next-gen GPUs (Blackwell/GB200/GB300) or comparable accelerators into containerized clusters
Proven track record debugging large-scale, cloud-native stacks across networking (RDMA/RoCE), storage, and control planes
Customer-facing engineering or solutions-architect background: requirements gathering, PoC ownership, roadmap influence
Familiarity with CI/CD (GitHub Actions, Tekton), observability (Prometheus, OpenTelemetry), and infrastructure-as-code
Excellent communication-able to switch between deep technical detail and high-level business impact
6+ years of professional software development experience in distributed systems (Go, Rust, C/C++ or Python for tooling)
BS or MS (or equivalent experience) in Computer Engineering, Computer Science, or related field

Preferred

Upstream contributions to Kubernetes, Slurm, Volcano, or similar projects
Experience with GPU computing (CUDA), deep learning workloads

Benefits

Equity
Benefits

Company

NVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI.

H1B Sponsorship

NVIDIA has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1877)
2024 (1355)
2023 (976)
2022 (835)
2021 (601)
2020 (529)

Funding

Current Stage
Public Company
Total Funding
$4.09B
Key Investors
ARPA-EARK Investment ManagementSoftBank Vision Fund
2023-05-09Grant· $5M
2022-08-09Post Ipo Equity· $65M
2021-02-18Post Ipo Equity

Leadership Team

leader-logo
Jensen Huang
Founder and CEO
linkedin
leader-logo
Michael Kagan
Chief Technology Officer
linkedin
Company data provided by crunchbase