NVIDIA · 4 months ago
Cluster Deployment Operations Engineer - NVIS
NVIDIA is a leading technology company known for its innovative hardware platforms. They are seeking a dedicated Cluster Deployment Operations Engineer to support product deployments and collaborate with engineering and field organizations to ensure successful operationalization of new hardware in production environments.
AI InfrastructureArtificial Intelligence (AI)Consumer ElectronicsFoundational AIGPUHardwareSoftwareVirtual Reality
Responsibilities
Playing an integral role in NVIDIA’s New Product Introduction (NPI) team, acting as the link between engineering and the NVIS field team for cluster deployment and management solutions! We bridge the gap between product roadmaps and real world deployments
Collaborating closely with engineering and product teams to review and influence design decisions for products centered around large-scale AI Factory deployments, tracking progress towards production across the product development lifecycle
Describing architectural and design changes, building clear and actionable tasks for the field, including standardized deployment guides, configuration methodologies, and validation workflows. Present regularly to the field to ensure organizational alignment
Validating complex cluster configurations for performance, scalability, and resilience, ensuring they meet the requirements of real-world customer scenarios
Supporting NVIDIA's mission by ensuring our breakthrough technologies are successfully deployed for global customers by both NVIDIA and our OEM partners
Qualification
Required
Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent experience)
10+ years of experience in at least two of the following: HPC/large-scale cluster administration, Linux systems engineering, infrastructure automation (e.g., Ansible, Salt), or data center operations
5+ years of direct, hands-on experience provisioning, managing, and optimizing bare-metal clusters using NVIDIA Base Command Manager (BCM) or similar technology (e.g. proprietary Cloud Service Provider tools, Warewulf, xCAT)
Expert knowledge of how Slurm and Kubernetes are best deployed, managed, and used, including workload submission and resource management
Proficiency in Python and Bash scripting for automation, cluster validation, and workflow optimization
Hands-on experience using cluster telemetry and dashboard tools to assess HPC and AI clusters (e.g., Prometheus, Grafana, DCGM, and similar observability stacks)
Outstanding written and verbal communication skills, with the ability to explain complex technical concepts to both technical and non-technical collaborators
A customer-first attitude, self-motivation, and a proactive approach to leadership in diverse environments
Preferred
Proficiency with cluster networking including InfiniBand and Spectrum-X
Experience with NVIDIA Mission Control, NVIDIA's AI Factory management platform
Familiarity with CI/CD workflows in an infrastructure context, including tools such as Git, GitLab, and Jenkins
Any experience with large language models (LLMs) as part of a software development or content creation workflow - we rely heavily on LLMs to accelerate content delivery
Background in Professional Services, customer-facing deployment, and solutions optimization. We value when members of the NPI team have experienced the complexities that our field teams encounter every day as well as industry certifications such as CKA/CKAD (Certified Kubernetes Administrator/Developer), RHCE, or other industry-recognized Linux/HPC credentials
Benefits
Equity
Comprehensive benefits package
Company
NVIDIA
NVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI.
H1B Sponsorship
NVIDIA has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1877)
2024 (1355)
2023 (976)
2022 (835)
2021 (601)
2020 (529)
Funding
Current Stage
Public CompanyTotal Funding
$4.09BKey Investors
ARPA-EARK Investment ManagementSoftBank Vision Fund
2023-05-09Grant· $5M
2022-08-09Post Ipo Equity· $65M
2021-02-18Post Ipo Equity
Recent News
2026-01-11
2026-01-11
Company data provided by crunchbase