Member of Technical Staff, Kernel Engineering jobs in United States
cer-icon
Apply on Employer Site
company-logo

Inferact · 1 day ago

Member of Technical Staff, Kernel Engineering

Inferact's mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. The role involves writing kernels and low-level optimizations to maximize performance on various accelerators, collaborating with hardware vendors to ensure optimal integration with vLLM.

Computer Software
check
H1B Sponsorednote

Responsibilities

Write the kernels and low-level optimizations that make vLLM the fastest inference engine in the world
Work directly with hardware vendors to ensure we're extracting maximum performance from every generation of hardware

Qualification

CUDA kernelsGPU architectureC++PythonProfiling toolsML-specific kernel optimizationQuantization techniquesAccelerator platformsCompiler technologiesPerformance optimization methodologiesBenchmarks obsessionTechnical blogs

Required

Bachelor's degree or equivalent experience in computer science, engineering, or similar
Deep experience writing CUDA kernels or equivalent (CuTeDSL, Triton, TileLang, Pallas)
Strong understanding of GPU architecture: memory hierarchy, warp scheduling, tiling, tensor cores
Proficiency in C++ and Python with demonstrated ability to write high-performance code
Experience with profiling tools (Nsight, rocprof) and performance optimization methodologies
Obsession with benchmarks and squeezing every percentage point of speedup

Preferred

Experience with ML-specific kernel optimization (FlashAttention, fused kernels)
Knowledge of quantization techniques (INT8, FP8, mixed-precision)
Familiarity with multiple accelerator platforms (NVIDIA, AMD, TPU, Intel)
Experience with compiler technologies (LLVM, MLIR, XLA)

Benefits

Generous health, dental, and vision benefits
401(k) company match

Company

Inferact

twitter
company-logo
Inferact is a startup founded by creators and core maintainers of vLLM, the most popular open-source LLM inference engine.

Funding

Current Stage
Early Stage
Company data provided by crunchbase