Director of Engineering, Inference Services jobs in United States
cer-icon
Apply on Employer Site
company-logo

CoreWeave · 5 hours ago

Director of Engineering, Inference Services

CoreWeave is The Essential Cloud for AI™, delivering a platform of technology and tools for innovators. The Director of Engineering will lead a world-class engineering organization to design, build, and operate the fastest GPU inference services, focusing on optimizing model-serving and ensuring operational excellence.

AI InfrastructureArtificial Intelligence (AI)Cloud ComputingCloud InfrastructureInformation TechnologyMachine Learning
badNo H1BnoteU.S. Citizen Onlynote

Responsibilities

Define and continuously refine the end-to-end Inference Platform roadmap, prioritizing low-latency, high-throughput model serving and world-class developer UX
Set technical standards for runtime selection, GPU/CPU heterogeneity, quantization, and model-optimization techniques
Design and implement a global, Kubernetes-native inference control plane that delivers <50 ms P99 latencies at scale
Build adaptive micro-batching, request-routing, and autoscaling mechanisms that maximize GPU utilization while meeting strict SLAs
Integrate model-optimization pipelines (TensorRT, ONNX Runtime, BetterTransformer, AWQ, etc.) for frictionless deployment
Implement state-of-the-art runtime optimizations—including speculative decoding, KV-cache reuse across batches, early-exit heuristics, and tensor-parallel streaming—to squeeze every microsecond out of LLM inference while retaining accuracy
Establish SLOs/SLA dashboards, real-time observability, and self-healing mechanisms for thousands of models across multiple regions
Drive cost-performance trade-off tooling that makes it trivial for customers to choose the best HW tier for each workload
Hire, mentor, and grow a diverse team of engineers and managers passionate about large-scale AI inference
Foster a customer-obsessed, metrics-driven engineering culture with crisp design reviews and blameless post-mortems
Partner closely with Product, Orchestration, Networking, and Security teams to deliver a unified CoreWeave experience
Engage directly with flagship customers (internal and external) to gather feedback and shape the roadmap

Qualification

Large-scale distributed systemsGPU inference servicesKubernetesLLM optimizationModel-serving runtimesCost-performance optimizationReal-time observabilityCommunicatorMentoring engineersCollaboration with teams

Required

10+ years building large-scale distributed systems or cloud services, with 5+ years leading multiple engineering teams
Proven success delivering mission-critical model-serving or real-time data-plane services (e.g., Triton, TorchServe, vLLM, Ray Serve, SageMaker Inference, GCP Vertex Prediction)
Deep understanding of GPU/CPU resource isolation, NUMA-aware scheduling, micro-batching, and low-latency networking (gRPC, QUIC, RDMA)
Track record of optimizing cost-per-token / cost-per-request and hitting sub-100 ms global P99 latencies
Expertise in Kubernetes, service meshes, and CI/CD for ML workloads; familiarity with Slurm, Kueue, or other schedulers a plus
Hands-on experience with LLM optimization (quantization, compilation, tensor parallelism, speculative decoding) and hardware-aware model compression
Excellent communicator who can translate deep technical concepts into clear business value for C-suite and engineering audiences
Bachelor's or Master's in CS, EE, or related field (or equivalent practical experience)

Preferred

Experience operating multi-region inference fleets at a cloud provider or hyperscaler
Contributions to open-source inference or MLOps projects
Familiarity with observability stacks (Prometheus, Grafana, OpenTelemetry) for AI workloads
Background in edge inference, streaming inference, or real-time personalization systems

Benefits

Medical, dental, and vision insurance - 100% paid for by CoreWeave
Company-paid Life Insurance
Voluntary supplemental life insurance
Short and long-term disability insurance
Flexible Spending Account
Health Savings Account
Tuition Reimbursement
Ability to Participate in Employee Stock Purchase Program (ESPP)
Mental Wellness Benefits through Spring Health
Family-Forming support provided by Carrot
Paid Parental Leave
Flexible, full-service childcare support with Kinside
401(k) with a generous employer match
Flexible PTO
Catered lunch each day in our office and data center locations
A casual work environment
A work culture focused on innovative disruption

Company

CoreWeave

twittertwittertwitter
company-logo
CoreWeave is a cloud-based AI infrastructure company offering GPU cloud services to simplify AI and machine learning workloads.

Funding

Current Stage
Public Company
Total Funding
$23.37B
Key Investors
Jane Street CapitalStack CapitalCoatue
2025-12-08Post Ipo Debt· $2.54B
2025-11-12Post Ipo Debt· $1B
2025-08-20Post Ipo Secondary

Leadership Team

leader-logo
Michael Intrator
Chief Executive Officer
linkedin
leader-logo
Nitin Agrawal
Chief Financial Officer
linkedin
Company data provided by crunchbase