Director, Technical Program Management - AI and ML Platforms jobs in United States
cer-icon
Apply on Employer Site
company-logo

NVIDIA · 4 days ago

Director, Technical Program Management - AI and ML Platforms

NVIDIA is a leading technology company seeking a Director of Technical Program Management to lead AI/ML Platform initiatives within the DGX Cloud Infrastructure team. This role focuses on coordinating extensive multi-functional programs to enhance the development and deployment of AI models, ensuring a seamless integration of hardware and orchestration for optimal performance.

AI InfrastructureArtificial Intelligence (AI)Consumer ElectronicsFoundational AIGPUHardwareSoftwareVirtual Reality
check
Growth Opportunities
check
H1B Sponsor Likelynote

Responsibilities

Lead and scale the Technical Program Management organization responsible for the DGX Cloud AI/ML platform, enabling over 1,000+ NVIDIA researchers globally
Drive the roadmap for end-to-end AI/ML infrastructure, spanning cluster bring-up, workload orchestration, GPU resource management, and integration with MLOps pipelines
Collaborate with leaders in technology and innovation to outline platform needs, synchronize computing approach with AI model advancement, and provide a seamless researcher journey
Lead complex programs involving next-generation systems (e.g., GB200) and fleet-wide scaling initiatives across OCI, GCP, and other hyperscalers
Own platform efficiency and capacity management, using deep understanding of scheduling systems (e.g., Slurm, hybrid models) to optimize job placement, utilization, and turnaround time
Establish data-driven operational metrics availability, occupancy, wait times, throughput and use them to guide continuous improvement and prioritization
Implement governance and visibility frameworks that drive alignment, predictability, and accountability across AI platform initiatives
Represent DGX Cloud programs to senior leadership, clearly articulating impact, risk, and value across engineering and research organizations

Qualification

Technical Program ManagementAI/ML Systems ImplementationJob Scheduling SlurmJob Scheduling KubernetesResource OptimizationCloudOn-Prem ArchitecturesObservability Tools GrafanaObservability Tools PrometheusExecutive CommunicationData-Driven Decision MakingTeam Leadership

Required

15+ overall years of technical program management experience, including 7+ years leading and developing TPM teams in infrastructure, AI/ML, or platform engineering domains
Demonstrated success in implementing AI and machine learning systems and platform initiatives at a large scale encompassing workload coordination, data pipeline incorporation, model training environments, and GPU fleet supervision
Deep technical understanding of AI/ML workflows, job scheduling (Slurm, Kubernetes, hybrid orchestration), and large-scale distributed systems
Proficiency in optimizing resource usage and monitoring performance metrics in compute-heavy settings
Experience building platforms across cloud and on-prem hybrid architectures, integrating with internal and external MLOps stacks
Proficiency with observability and telemetry tools (e.g., Grafana, Prometheus) for infrastructure monitoring and performance analysis
Bachelor or Master in Computer Science, Engineering, or related field (or equivalent experience)

Preferred

Proficient in AI/ML systems, model lifecycle oversight, and developer tools for extensive training tasks
Track record driving R&D productivity platforms and reducing friction for machine learning practitioners
Experience in new product introduction (NPI) for research and infrastructure systems
Deep familiarity with cloud compute and orchestration technologies, and a passion for automation and operational excellence
Executive communication skills, able to translate complex technical programs into clear business and research outcomes

Benefits

Equity
Benefits

Company

NVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI.

H1B Sponsorship

NVIDIA has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1877)
2024 (1355)
2023 (976)
2022 (835)
2021 (601)
2020 (529)

Funding

Current Stage
Public Company
Total Funding
$4.09B
Key Investors
ARPA-EARK Investment ManagementSoftBank Vision Fund
2023-05-09Grant· $5M
2022-08-09Post Ipo Equity· $65M
2021-02-18Post Ipo Equity

Leadership Team

leader-logo
Jensen Huang
Founder and CEO
linkedin
leader-logo
Michael Kagan
Chief Technology Officer
linkedin
Company data provided by crunchbase