Apply on Employer Site

Embedding VC · 1 month ago

Member of Technical Staff - ML Infrastructure & Performance

San Mateo

Full-time

Onsite

Mid Level

Moonlake is focused on AI for creating real-time interactive content, and they are seeking a Member of Technical Staff to improve throughput, latency, and cost of their models. The role involves optimizing GPU performance, managing serving stacks, and ensuring system scalability.

Artificial Intelligence (AI)Impact Investing

Responsibilities

GPU performance: CUDA/Triton kernels, FlashAttention family, paged attention, CUDA Graphs

Serving stack: TensorRT-LLM/Triton Inference Server, vLLM/TGI; continuous batching; on-GPU KV reuse; speculative decoding/medusa; mixture-of-agents routing

Parallelism: FSDP/ZeRO, TP/PP/expert parallel; NCCL tuning

Quantization/PEFT: AWQ/GPTQ/FP8; LoRA/DoRA serving

Systems: Ray/k8s/Argo, observability (Prom/Grafana/OpenTelemetry), autoscaling, A/B infra, canary + rollback

Qualification

CUDATensorRTParallelismQuantizationKubernetesObservabilitySoft Skills

Required

Experience with GPU performance: CUDA/Triton kernels, FlashAttention family, paged attention, CUDA Graphs

Experience with serving stack: TensorRT-LLM/Triton Inference Server, vLLM/TGI; continuous batching; on-GPU KV reuse; speculative decoding/medusa; mixture-of-agents routing

Experience with parallelism: FSDP/ZeRO, TP/PP/expert parallel; NCCL tuning

Experience with quantization/PEFT: AWQ/GPTQ/FP8; LoRA/DoRA serving

Experience with systems: Ray/k8s/Argo, observability (Prom/Grafana/OpenTelemetry), autoscaling, A/B infra, canary + rollback

Previous experience at Infra-heavy startups such as Databricks, Roblox

Company

Embedding VC

Embedding invests in early-stage Generative AI startups.

Founded in 2023

Menlo Park, California, USA

2-10 employees

https://embedding.vc

Funding

Current Stage

Early Stage

Leadership Team

Roger Jie Luo

Founder & Managing Partner

Company data provided by crunchbase