Inferact · 1 day ago
Member of Technical Staff, Kernel Engineering
Inferact's mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. The role involves writing kernels and low-level optimizations to maximize performance on various accelerators, collaborating with hardware vendors to ensure optimal integration with vLLM.
Computer Software
Responsibilities
Write the kernels and low-level optimizations that make vLLM the fastest inference engine in the world
Work directly with hardware vendors to ensure we're extracting maximum performance from every generation of hardware
Qualification
Required
Bachelor's degree or equivalent experience in computer science, engineering, or similar
Deep experience writing CUDA kernels or equivalent (CuTeDSL, Triton, TileLang, Pallas)
Strong understanding of GPU architecture: memory hierarchy, warp scheduling, tiling, tensor cores
Proficiency in C++ and Python with demonstrated ability to write high-performance code
Experience with profiling tools (Nsight, rocprof) and performance optimization methodologies
Obsession with benchmarks and squeezing every percentage point of speedup
Preferred
Experience with ML-specific kernel optimization (FlashAttention, fused kernels)
Knowledge of quantization techniques (INT8, FP8, mixed-precision)
Familiarity with multiple accelerator platforms (NVIDIA, AMD, TPU, Intel)
Experience with compiler technologies (LLVM, MLIR, XLA)
Benefits
Generous health, dental, and vision benefits
401(k) company match
Company
Inferact
Inferact is a startup founded by creators and core maintainers of vLLM, the most popular open-source LLM inference engine.
Funding
Current Stage
Early StageCompany data provided by crunchbase