Member of Technical Staff - ML Infrastructure & Performance jobs in United States
cer-icon
Apply on Employer Site
company-logo

Embedding VC · 1 month ago

Member of Technical Staff - ML Infrastructure & Performance

Moonlake is focused on AI for creating real-time interactive content, and they are seeking a Member of Technical Staff to improve throughput, latency, and cost of their models. The role involves optimizing GPU performance, managing serving stacks, and ensuring system scalability.

Artificial Intelligence (AI)Impact Investing

Responsibilities

GPU performance: CUDA/Triton kernels, FlashAttention family, paged attention, CUDA Graphs
Serving stack: TensorRT-LLM/Triton Inference Server, vLLM/TGI; continuous batching; on-GPU KV reuse; speculative decoding/medusa; mixture-of-agents routing
Parallelism: FSDP/ZeRO, TP/PP/expert parallel; NCCL tuning
Quantization/PEFT: AWQ/GPTQ/FP8; LoRA/DoRA serving
Systems: Ray/k8s/Argo, observability (Prom/Grafana/OpenTelemetry), autoscaling, A/B infra, canary + rollback

Qualification

CUDATensorRTParallelismQuantizationKubernetesObservabilitySoft Skills

Required

Experience with GPU performance: CUDA/Triton kernels, FlashAttention family, paged attention, CUDA Graphs
Experience with serving stack: TensorRT-LLM/Triton Inference Server, vLLM/TGI; continuous batching; on-GPU KV reuse; speculative decoding/medusa; mixture-of-agents routing
Experience with parallelism: FSDP/ZeRO, TP/PP/expert parallel; NCCL tuning
Experience with quantization/PEFT: AWQ/GPTQ/FP8; LoRA/DoRA serving
Experience with systems: Ray/k8s/Argo, observability (Prom/Grafana/OpenTelemetry), autoscaling, A/B infra, canary + rollback
Previous experience at Infra-heavy startups such as Databricks, Roblox

Company

Embedding VC

twittertwitter
company-logo
Embedding invests in early-stage Generative AI startups.

Funding

Current Stage
Early Stage

Leadership Team

leader-logo
Roger Jie Luo
Founder & Managing Partner
linkedin
Company data provided by crunchbase