CoreWeave · 5 hours ago
Director of Engineering, Inference Services
CoreWeave is The Essential Cloud for AI™, delivering a platform of technology and tools for innovators. The Director of Engineering will lead a world-class engineering organization to design, build, and operate the fastest GPU inference services, focusing on optimizing model-serving and ensuring operational excellence.
AI InfrastructureArtificial Intelligence (AI)Cloud ComputingCloud InfrastructureInformation TechnologyMachine Learning
Responsibilities
Define and continuously refine the end-to-end Inference Platform roadmap, prioritizing low-latency, high-throughput model serving and world-class developer UX
Set technical standards for runtime selection, GPU/CPU heterogeneity, quantization, and model-optimization techniques
Design and implement a global, Kubernetes-native inference control plane that delivers <50 ms P99 latencies at scale
Build adaptive micro-batching, request-routing, and autoscaling mechanisms that maximize GPU utilization while meeting strict SLAs
Integrate model-optimization pipelines (TensorRT, ONNX Runtime, BetterTransformer, AWQ, etc.) for frictionless deployment
Implement state-of-the-art runtime optimizations—including speculative decoding, KV-cache reuse across batches, early-exit heuristics, and tensor-parallel streaming—to squeeze every microsecond out of LLM inference while retaining accuracy
Establish SLOs/SLA dashboards, real-time observability, and self-healing mechanisms for thousands of models across multiple regions
Drive cost-performance trade-off tooling that makes it trivial for customers to choose the best HW tier for each workload
Hire, mentor, and grow a diverse team of engineers and managers passionate about large-scale AI inference
Foster a customer-obsessed, metrics-driven engineering culture with crisp design reviews and blameless post-mortems
Partner closely with Product, Orchestration, Networking, and Security teams to deliver a unified CoreWeave experience
Engage directly with flagship customers (internal and external) to gather feedback and shape the roadmap
Qualification
Required
10+ years building large-scale distributed systems or cloud services, with 5+ years leading multiple engineering teams
Proven success delivering mission-critical model-serving or real-time data-plane services (e.g., Triton, TorchServe, vLLM, Ray Serve, SageMaker Inference, GCP Vertex Prediction)
Deep understanding of GPU/CPU resource isolation, NUMA-aware scheduling, micro-batching, and low-latency networking (gRPC, QUIC, RDMA)
Track record of optimizing cost-per-token / cost-per-request and hitting sub-100 ms global P99 latencies
Expertise in Kubernetes, service meshes, and CI/CD for ML workloads; familiarity with Slurm, Kueue, or other schedulers a plus
Hands-on experience with LLM optimization (quantization, compilation, tensor parallelism, speculative decoding) and hardware-aware model compression
Excellent communicator who can translate deep technical concepts into clear business value for C-suite and engineering audiences
Bachelor's or Master's in CS, EE, or related field (or equivalent practical experience)
Preferred
Experience operating multi-region inference fleets at a cloud provider or hyperscaler
Contributions to open-source inference or MLOps projects
Familiarity with observability stacks (Prometheus, Grafana, OpenTelemetry) for AI workloads
Background in edge inference, streaming inference, or real-time personalization systems
Benefits
Medical, dental, and vision insurance - 100% paid for by CoreWeave
Company-paid Life Insurance
Voluntary supplemental life insurance
Short and long-term disability insurance
Flexible Spending Account
Health Savings Account
Tuition Reimbursement
Ability to Participate in Employee Stock Purchase Program (ESPP)
Mental Wellness Benefits through Spring Health
Family-Forming support provided by Carrot
Paid Parental Leave
Flexible, full-service childcare support with Kinside
401(k) with a generous employer match
Flexible PTO
Catered lunch each day in our office and data center locations
A casual work environment
A work culture focused on innovative disruption
Company
CoreWeave
CoreWeave is a cloud-based AI infrastructure company offering GPU cloud services to simplify AI and machine learning workloads.
Funding
Current Stage
Public CompanyTotal Funding
$23.37BKey Investors
Jane Street CapitalStack CapitalCoatue
2025-12-08Post Ipo Debt· $2.54B
2025-11-12Post Ipo Debt· $1B
2025-08-20Post Ipo Secondary
Recent News
2026-01-13
The Motley Fool
2026-01-13
2026-01-13
Company data provided by crunchbase