Apply on Employer Site

CoreWeave · 5 hours ago

Director of Engineering, Inference Services

Bellevue, WA

Full-time

Hybrid

Director/Executive

$206K/yr - $303K/yr

10+ years exp

CoreWeave is The Essential Cloud for AI™, delivering a platform of technology and tools for innovators. The Director of Engineering will lead a world-class engineering organization to design, build, and operate the fastest GPU inference services, focusing on optimizing model-serving and ensuring operational excellence.

AI InfrastructureArtificial Intelligence (AI)Cloud ComputingCloud InfrastructureInformation TechnologyMachine Learning

No H1B

U.S. Citizen Only

Responsibilities

Define and continuously refine the end-to-end Inference Platform roadmap, prioritizing low-latency, high-throughput model serving and world-class developer UX

Set technical standards for runtime selection, GPU/CPU heterogeneity, quantization, and model-optimization techniques

Design and implement a global, Kubernetes-native inference control plane that delivers <50 ms P99 latencies at scale

Build adaptive micro-batching, request-routing, and autoscaling mechanisms that maximize GPU utilization while meeting strict SLAs

Integrate model-optimization pipelines (TensorRT, ONNX Runtime, BetterTransformer, AWQ, etc.) for frictionless deployment

Implement state-of-the-art runtime optimizations—including speculative decoding, KV-cache reuse across batches, early-exit heuristics, and tensor-parallel streaming—to squeeze every microsecond out of LLM inference while retaining accuracy

Establish SLOs/SLA dashboards, real-time observability, and self-healing mechanisms for thousands of models across multiple regions

Drive cost-performance trade-off tooling that makes it trivial for customers to choose the best HW tier for each workload

Hire, mentor, and grow a diverse team of engineers and managers passionate about large-scale AI inference

Foster a customer-obsessed, metrics-driven engineering culture with crisp design reviews and blameless post-mortems

Partner closely with Product, Orchestration, Networking, and Security teams to deliver a unified CoreWeave experience

Engage directly with flagship customers (internal and external) to gather feedback and shape the roadmap

Qualification

Large-scale distributed systemsGPU inference servicesKubernetesLLM optimizationModel-serving runtimesCost-performance optimizationReal-time observabilityCommunicatorMentoring engineersCollaboration with teams

Required

10+ years building large-scale distributed systems or cloud services, with 5+ years leading multiple engineering teams

Proven success delivering mission-critical model-serving or real-time data-plane services (e.g., Triton, TorchServe, vLLM, Ray Serve, SageMaker Inference, GCP Vertex Prediction)

Deep understanding of GPU/CPU resource isolation, NUMA-aware scheduling, micro-batching, and low-latency networking (gRPC, QUIC, RDMA)

Track record of optimizing cost-per-token / cost-per-request and hitting sub-100 ms global P99 latencies

Expertise in Kubernetes, service meshes, and CI/CD for ML workloads; familiarity with Slurm, Kueue, or other schedulers a plus

Hands-on experience with LLM optimization (quantization, compilation, tensor parallelism, speculative decoding) and hardware-aware model compression

Excellent communicator who can translate deep technical concepts into clear business value for C-suite and engineering audiences

Bachelor's or Master's in CS, EE, or related field (or equivalent practical experience)

Preferred

Experience operating multi-region inference fleets at a cloud provider or hyperscaler

Contributions to open-source inference or MLOps projects

Familiarity with observability stacks (Prometheus, Grafana, OpenTelemetry) for AI workloads

Background in edge inference, streaming inference, or real-time personalization systems

Benefits

Medical, dental, and vision insurance - 100% paid for by CoreWeave

Company-paid Life Insurance

Voluntary supplemental life insurance

Short and long-term disability insurance

Flexible Spending Account

Health Savings Account

Tuition Reimbursement

Ability to Participate in Employee Stock Purchase Program (ESPP)

Mental Wellness Benefits through Spring Health

Family-Forming support provided by Carrot

Paid Parental Leave

Flexible, full-service childcare support with Kinside

401(k) with a generous employer match

Flexible PTO

Catered lunch each day in our office and data center locations

A casual work environment

A work culture focused on innovative disruption

Company

CoreWeave

CoreWeave is a cloud-based AI infrastructure company offering GPU cloud services to simplify AI and machine learning workloads.

Founded in 2017

Livingston, New Jersey, USA

1001-5000 employees

https://www.coreweave.com

Funding

Current Stage

Public Company

Total Funding

$23.37B

Key Investors

Jane Street CapitalStack CapitalCoatue

2025-12-08Post Ipo Debt· $2.54B

2025-11-12Post Ipo Debt· $1B

2025-08-20Post Ipo Secondary

Leadership Team

Michael Intrator

Chief Executive Officer

Nitin Agrawal

Chief Financial Officer

Recent News

Benzinga.com

CoreWeave (CRWV) Soars As CEO Dismisses Nvidia 'Circular Financing' Allegations

2026-01-13

The Motley Fool

Why CoreWeave Stock Popped Today

2026-01-13

Investing.com

CoreWeave stock surges after CEO rebuts GPU useful life concerns

2026-01-13

Company data provided by crunchbase