Fabrik Talent · 5 days ago
Staff Inference Engineer (Multimodality/LLMs)
Fabrik Talent is partnering with a frontier AI company focused on building large-scale multimodal and language models. The role involves pushing the limits of LLM inference quality and performance while collaborating with researchers and engineers to develop production-ready systems.
Staffing & Recruiting
Responsibilities
Push the limits of LLM inference quality, latency, and cost
Turn cutting-edge research ideas into production-ready systems
Own inference metrics and performance in real deployments
Write fast, elegant code close to the metal (and know •why• it’s fast)
Collaborate tightly with researchers and infra engineers to shape what ships next
Qualification
Required
Proven experience living at the intersection of research-grade LLMs and large-scale systems engineering
Deep understanding of transformers and modern LLM inference
Strong instincts for distributed systems and low-precision computation
Obsession with performance: kernels, memory, matmuls, and hardware bottlenecks
Comfort working across Python and lower-level stacks (C/C++, CUDA, Triton, etc.)
Someone opinionated, pragmatic, and happy to challenge 'best practice' when it slows things down
Preferred
Research exposure is a plus, but shipping impact matters more
Benefits
Equity
Company
Fabrik Talent
At Fabrik Talent, we bring close to 20 years of experience in tech recruitment, with the last decade dedicated to machine learning, AI, and data engineering.
Funding
Current Stage
Early StageCompany data provided by crunchbase