Apply on Employer Site

Inferact · 1 day ago

Member of Technical Staff, Kernel Engineering

San Francisco, CA

Full-time

Hybrid

Mid, Senior Level

$200K/yr - $400K/yr

Inferact's mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. The role involves writing kernels and low-level optimizations to maximize performance on various accelerators, collaborating with hardware vendors to ensure optimal integration with vLLM.

Computer Software

H1B Sponsored

Responsibilities

Write the kernels and low-level optimizations that make vLLM the fastest inference engine in the world

Work directly with hardware vendors to ensure we're extracting maximum performance from every generation of hardware

Qualification

CUDA kernelsGPU architectureC++PythonProfiling toolsML-specific kernel optimizationQuantization techniquesAccelerator platformsCompiler technologiesPerformance optimization methodologiesBenchmarks obsessionTechnical blogs

Required

Bachelor's degree or equivalent experience in computer science, engineering, or similar

Deep experience writing CUDA kernels or equivalent (CuTeDSL, Triton, TileLang, Pallas)

Strong understanding of GPU architecture: memory hierarchy, warp scheduling, tiling, tensor cores

Proficiency in C++ and Python with demonstrated ability to write high-performance code

Experience with profiling tools (Nsight, rocprof) and performance optimization methodologies

Obsession with benchmarks and squeezing every percentage point of speedup

Preferred

Experience with ML-specific kernel optimization (FlashAttention, fused kernels)

Knowledge of quantization techniques (INT8, FP8, mixed-precision)

Familiarity with multiple accelerator platforms (NVIDIA, AMD, TPU, Intel)

Experience with compiler technologies (LLVM, MLIR, XLA)

Benefits

Generous health, dental, and vision benefits

401(k) company match

Company

Inferact

Inferact is a startup founded by creators and core maintainers of vLLM, the most popular open-source LLM inference engine.

Founded in 2025

San Francisco, CA, US

11-50 employees

https://inferact.ai

Funding

Current Stage

Early Stage

Company data provided by crunchbase