Apply on Employer Site

Embedding VC · 1 month ago

Member of Technical Staff - ML Infrastructure & Performance

San Mateo, CA

Full-time

Onsite

Mid Level

Embedding VC is focused on AI for creating real-time interactive content, and they are seeking a Member of Technical Staff specializing in ML Infrastructure & Performance. The role involves improving throughput, latency, and cost by deploying models significantly faster and cheaper while ensuring quality is maintained.

Artificial Intelligence (AI)Impact Investing

Responsibilities

GPU performance: CUDA/Triton kernels, FlashAttention family, paged attention, CUDA Graphs

Serving stack: TensorRT-LLM/Triton Inference Server, vLLM/TGI; continuous batching; on-GPU KV reuse; speculative decoding/medusa; mixture-of-agents routing

Parallelism: FSDP/ZeRO, TP/PP/expert parallel; NCCL tuning

Quantization/PEFT: AWQ/GPTQ/FP8; LoRA/DoRA serving

Systems: Ray/k8s/Argo, observability (Prom/Grafana/OpenTelemetry), autoscaling, A/B infra, canary + rollback

Qualification

CUDATensorRT-LLMFSDPQuantizationRayK8sPrometheusGrafanaOpenTelemetry

Required

GPU performance: CUDA/Triton kernels, FlashAttention family, paged attention, CUDA Graphs

Serving stack: TensorRT-LLM/Triton Inference Server, vLLM/TGI; continuous batching; on-GPU KV reuse; speculative decoding/medusa; mixture-of-agents routing

Parallelism: FSDP/ZeRO, TP/PP/expert parallel; NCCL tuning

Quantization/PEFT: AWQ/GPTQ/FP8; LoRA/DoRA serving

Systems: Ray/k8s/Argo, observability (Prom/Grafana/OpenTelemetry), autoscaling, A/B infra, canary + rollback

Previous experience at Infra-heavy startups such as Databricks, Roblox

Company

Embedding VC

Embedding invests in early-stage Generative AI startups.

Founded in 2023

Menlo Park, California, USA

2-10 employees

https://embedding.vc

Funding

Current Stage

Early Stage

Leadership Team

Roger Jie Luo

Founder & Managing Partner

Company data provided by crunchbase