SIGN IN
Staff Inference Engineer (Multimodality/LLMs) jobs in United States
cer-icon
Apply on Employer Site
company-logo

Fabrik Talent · 5 days ago

Staff Inference Engineer (Multimodality/LLMs)

Fabrik Talent is partnering with a frontier AI company focused on building large-scale multimodal and language models. The role involves pushing the limits of LLM inference quality and performance while collaborating with researchers and engineers to develop production-ready systems.
Staffing & Recruiting

Responsibilities

Push the limits of LLM inference quality, latency, and cost
Turn cutting-edge research ideas into production-ready systems
Own inference metrics and performance in real deployments
Write fast, elegant code close to the metal (and know •why• it’s fast)
Collaborate tightly with researchers and infra engineers to shape what ships next

Qualification

LLM inferenceDistributed systemsLow-precision computationPythonC/C++CUDATritonPerformance optimizationResearch exposureCollaboration

Required

Proven experience living at the intersection of research-grade LLMs and large-scale systems engineering
Deep understanding of transformers and modern LLM inference
Strong instincts for distributed systems and low-precision computation
Obsession with performance: kernels, memory, matmuls, and hardware bottlenecks
Comfort working across Python and lower-level stacks (C/C++, CUDA, Triton, etc.)
Someone opinionated, pragmatic, and happy to challenge 'best practice' when it slows things down

Preferred

Research exposure is a plus, but shipping impact matters more

Benefits

Equity

Company

Fabrik Talent

twitter
company-logo
At Fabrik Talent, we bring close to 20 years of experience in tech recruitment, with the last decade dedicated to machine learning, AI, and data engineering.

Funding

Current Stage
Early Stage
Company data provided by crunchbase