Apply on Employer Site

CoreWeave · 5 months ago

Principal Engineer, Inference

Sunnyvale, CA / Bellevue, WA

Full-time

Hybrid

Lead/Staff

$206K/yr - $303K/yr

10+ years exp

CoreWeave is the AI Hyperscaler™, delivering cutting-edge cloud services for AI. They are seeking a Principal Engineer to lead the development of their next-generation Inference Platform, focusing on architecting and building GPU inference services while collaborating with various teams to optimize AI applications.

AI InfrastructureArtificial Intelligence (AI)Cloud ComputingCloud InfrastructureInformation TechnologyMachine Learning

No H1B

U.S. Citizen Only

Responsibilities

Define the technical roadmap for ultra-low-latency, high-throughput inference

Evaluate and influence adoption of runtimes and frameworks (Triton, vLLM, TensorRT-LLM, Ray Serve, TorchServe) and guide build-vs-buy decisions

Design Kubernetes-native control-plane components that deploy, autoscale, and monitor fleets of model-server pods spanning thousands of GPUs

Implement advanced optimizations: micro-batching, speculative decoding, KV-cache reuse, early-exit heuristics, tensor/stream parallel inference, to squeeze every microsecond out of large-model serving

Build intelligent request routing and adaptive scheduling to maximize GPU utilization while guaranteeing strict P99 latency SLAs

Create real-time observability, live debugging hooks, and automated rollback/traffic-shift for model versioning

Develop cost-per-token and cost-per-request analytics so customers can instantly select the ideal hardware tier

Write production code, reference implementations, and performance benchmarks across gRPC/HTTP, CUDA Graphs, and NCCL/SHARP fast-paths

Lead deep-dive investigations into network, PCIe, NVLink, and memory-bandwidth bottlenecks

Coach engineers on large-scale inference best practices and performance profiling

Partner with lighthouse customers to launch and optimize mission-critical, real-time AI applications

Qualification

GPU inference servicesKubernetes-native architectureReal-time ML inferencePyTorchTensorFlowMicro-batch schedulingCost-per-token analyticsCI/CD pipelinesCommunicationMentorshipCollaboration

Required

10+ years building distributed systems or HPC/cloud services, with 4+ years focused on real-time ML inference or other latency-critical data planes

Demonstrated expertise in micro-batch schedulers, GPU resource isolation, KV caching, speculative decoding, and mixed precision (BF16/FP8) inference

Deep knowledge of PyTorch or TensorFlow serving internals, CUDA kernels, NCCL/SHARP, RDMA, NUMA, and GPU interconnect topologies

Proven track record of driving sub-50 ms global P99 latencies and optimizing cost-per-token / cost-per-request on multi-node GPU clusters

Fluency with Kubernetes (or Slurm/Ray) at production scale plus CI/CD, service meshes, and observability stacks (Prometheus, Grafana, OpenTelemetry)

Excellent communicator who influences architecture across teams and presents complex trade-offs to executives and customers

Bachelor's or Master's in CS, EE, or related field (or equivalent practical experience)

Preferred

Code contributions to open-source inference frameworks (vLLM, Triton, Ray Serve, TensorRT-LLM, TorchServe)

Experience operating multi-region inference fleets or streaming-token services at a hyperscaler or AI research lab

Publications/talks on latency optimization, token streaming, or advanced model-server architectures

Benefits

Medical, dental, and vision insurance - 100% paid for by CoreWeave

Company-paid Life Insurance

Voluntary supplemental life insurance

Short and long-term disability insurance

Flexible Spending Account

Health Savings Account

Tuition Reimbursement

Ability to Participate in Employee Stock Purchase Program (ESPP)

Mental Wellness Benefits through Spring Health

Family-Forming support provided by Carrot

Paid Parental Leave

Flexible, full-service childcare support with Kinside

401(k) with a generous employer match

Flexible PTO

Catered lunch each day in our office and data center locations

A casual work environment

A work culture focused on innovative disruption

Company

CoreWeave

CoreWeave is a cloud-based AI infrastructure company offering GPU cloud services to simplify AI and machine learning workloads.

Founded in 2017

Livingston, New Jersey, USA

1001-5000 employees

https://www.coreweave.com

Funding

Current Stage

Public Company

Total Funding

$23.37B

Key Investors

Jane Street CapitalStack CapitalCoatue

2025-12-08Post Ipo Debt· $2.54B

2025-11-12Post Ipo Debt· $1B

2025-08-20Post Ipo Secondary

Leadership Team

Michael Intrator

Chief Executive Officer

Nitin Agrawal

Chief Financial Officer

Recent News

The Motley Fool

Undervalued and Ignored: 2 Artificial Intelligence (AI) Stocks With Market-Beating Potential

2026-01-09

The Motley Fool

If You Own AES Stock, Take a Look at This Instead

2026-01-09

Benzinga.com

Brad Gerstner Bets On This Stock To Benefit From Nvidia's Rubin Platform: 'A Really Interesting Opportunity'

2026-01-08

Company data provided by crunchbase