Embedding VC · 1 week ago
Member of Technical Staff - Efficient ML
Embedding-vc is introducing Moonlake, an AI platform for creating world simulations. They are seeking a Member of Technical Staff focused on efficient machine learning, responsible for optimizing training efficiency, GPU performance, inference optimization, and ensuring infrastructure reliability.
Artificial Intelligence (AI)Impact Investing
Responsibilities
Dataloaders, fusion, activation remat, gradient checkpointing
FSDP/ZeRO/tensor+pipeline parallel; NCCL tuning
Nsight profiling, Triton/CUDA kernels, fused ops
Flash-attention–style speedups, sequence packing, KV-cache tricks
Low-latency serving, continuous batching, speculative decoding
Quantization (GPTQ/AWQ), distillation, pruning
SLURM/K8s multi-node jobs, checkpoint hygiene
Determinism, env pinning, GPU failure handling
Qualification
Required
Experience with dataloaders, fusion, activation remat, gradient checkpointing
Knowledge of FSDP/ZeRO/tensor+pipeline parallel and NCCL tuning
Proficiency in Nsight profiling, Triton/CUDA kernels, and fused ops
Experience with Flash-attention–style speedups, sequence packing, and KV-cache tricks
Skills in low-latency serving, continuous batching, and speculative decoding
Familiarity with quantization techniques (GPTQ/AWQ), distillation, and pruning
Experience with SLURM/K8s multi-node jobs and checkpoint hygiene
Understanding of determinism, environment pinning, and GPU failure handling