CoreWeave · 1 day ago
Technical Program Manager – Cluster Orchestration & Model Benchmarking
CoreWeave is The Essential Cloud for AI™, providing a platform for innovators to build and scale AI. The Technical Program Manager will drive programs for cluster orchestration and model benchmarking, collaborating with engineering and product teams to enhance orchestration systems and ensure performance and cost efficiency of AI workloads.
Artificial Intelligence (AI)Cloud ComputingCloud InfrastructureInformation TechnologyMachine Learning
Responsibilities
Drive end-to-end program management for cluster orchestration initiatives—spanning SUNK, Kubernetes, and emerging workload schedulers
Lead cross-functional efforts to deliver next-generation cluster orchestration capabilities for distributed AI training and inference workloads
Partner with engineering and product to define roadmaps for cluster utilization, scheduling efficiency, preemption logic, multi-tenant fairness, and workload resilience
Own the execution of model benchmarking programs—establishing frameworks, datasets, and metrics to measure model performance, throughput, latency, and cost across hardware types and orchestration environments
Develop and scale processes for cross-team dependency management, performance testing, and release management — owning external release management for SUNK and other cluster orchestrators, including planning, coordination, and rollout of customer-facing updates
Collaborate with infrastructure and DevOps teams to ensure orchestration systems meet CoreWeave’s reliability and scalability goals
Build program dashboards, success metrics, and feedback loops to improve workload scheduling efficiency, GPU and cluster utilization, and time-to-deployment
Create strong communication channels between AI Platform Engineering, Infrastructure, and Product to align roadmap priorities and deliver predictable, high-impact outcomes
Qualification
Required
Bachelor's degree in a technical field or equivalent experience
8+ years of technical program management experience in distributed systems, cloud infrastructure, or ML/AI platforms
Proven success leading programs involving large-scale orchestration or scheduling systems (e.g., Kubernetes, Ray, Slurm, Kueue, or proprietary systems)
Strong technical fluency in distributed computing, job scheduling, Kubernetes orchestration, and benchmarking methodologies
Demonstrated ability to define success metrics and drive measurable improvements in performance, reliability, or efficiency
Exceptional communication and collaboration skills, with a track record of aligning multiple teams and stakeholders on complex technical initiatives
Preferred
Experience with model benchmarking frameworks, profiling tools, or distributed test harnesses (e.g., MLPerf, vLLM benchmarks, custom evaluation pipelines)
Understanding of GPU types, model parallelism, and distributed training/inference performance trade-offs
Experience designing or managing benchmarking data pipelines and visualization tooling
Background in building operational maturity and visibility within high-growth, multi-team technical organizations
Benefits
Medical, dental, and vision insurance - 100% paid for by CoreWeave
Company-paid Life Insurance
Voluntary supplemental life insurance
Short and long-term disability insurance
Flexible Spending Account
Health Savings Account
Tuition Reimbursement
Ability to Participate in Employee Stock Purchase Program (ESPP)
Mental Wellness Benefits through Spring Health
Family-Forming support provided by Carrot
Paid Parental Leave
Flexible, full-service childcare support with Kinside
401(k) with a generous employer match
Flexible PTO
Catered lunch each day in our office and data center locations
A casual work environment
A work culture focused on innovative disruption
Company
CoreWeave
CoreWeave is a cloud-based AI infrastructure company offering GPU cloud services to simplify AI and machine learning workloads.
Funding
Current Stage
Public CompanyTotal Funding
$23.37BKey Investors
Jane Street CapitalStack CapitalCoatue
2025-12-08Post Ipo Debt· $2.54B
2025-11-12Post Ipo Debt· $1B
2025-08-20Post Ipo Secondary
Recent News
TheStreet.com
2026-01-03
Company data provided by crunchbase