Apply on Employer Site

Fabrik Talent · 5 days ago

Staff Inference Engineer (Multimodality/LLMs)

United States

Full-time

Remote

Lead/Staff

$300K/yr - $400K/yr

Fabrik Talent is partnering with a frontier AI company focused on building large-scale multimodal and language models. The role involves pushing the limits of LLM inference quality and performance while collaborating with researchers and engineers to develop production-ready systems.

Staffing & Recruiting

Responsibilities

Push the limits of LLM inference quality, latency, and cost

Turn cutting-edge research ideas into production-ready systems

Own inference metrics and performance in real deployments

Write fast, elegant code close to the metal (and know •why• it’s fast)

Collaborate tightly with researchers and infra engineers to shape what ships next

Qualification

LLM inferenceDistributed systemsLow-precision computationPythonC/C++CUDATritonPerformance optimizationResearch exposureCollaboration

Required

Proven experience living at the intersection of research-grade LLMs and large-scale systems engineering

Deep understanding of transformers and modern LLM inference

Strong instincts for distributed systems and low-precision computation

Obsession with performance: kernels, memory, matmuls, and hardware bottlenecks

Comfort working across Python and lower-level stacks (C/C++, CUDA, Triton, etc.)

Someone opinionated, pragmatic, and happy to challenge 'best practice' when it slows things down

Preferred

Research exposure is a plus, but shipping impact matters more

Benefits

Equity

Company

Fabrik Talent

At Fabrik Talent, we bring close to 20 years of experience in tech recruitment, with the last decade dedicated to machine learning, AI, and data engineering.

2-10 employees

https://www.fabriktalent.com

Funding

Current Stage

Early Stage

Company data provided by crunchbase