Apply on Employer Site

Inferact · 1 day ago

Member of Technical Staff, Performance and Scale

San Francisco, CA

Full-time

Onsite

Mid, Senior Level

$200K/yr - $400K/yr

Inferact is dedicated to advancing AI progress with vLLM, aiming to make AI inference cheaper and faster. They are seeking an infrastructure engineer to design and implement distributed systems that enable vLLM to serve models across thousands of accelerators with minimal latency and maximum reliability.

Computer Software

H1B Sponsored

Responsibilities

Design and implement the foundational layers that enable vLLM to serve models across thousands of accelerators with minimal latency and maximum reliability

Qualification

RustGoC++Distributed systemsNetwork protocolsHigh-performance I/OML serving infrastructureGPU programmingGPU interconnectsDebugging complex systemsSystem reliabilityPerformance improvement

Required

Bachelor's degree or equivalent experience in computer science, engineering, or similar

Strong systems programming skills in Rust, Go, or C++

Experience designing and building high-performance distributed systems at scale

Understanding of network protocols and high-performance I/O

Ability to debug complex distributed systems issues

Preferred

Experience with ML serving infrastructure and disaggregated inference architecture

Familiarity with GPU programming models and memory hierarchies

Knowledge of GPU interconnects (NVLink, InfiniBand, RoCE) and their performance characteristics

Track record of improving system reliability and performance at scale

Benefits

Generous health, dental, and vision benefits

401(k) company match

Company

Inferact

Inferact is a startup founded by creators and core maintainers of vLLM, the most popular open-source LLM inference engine.

Founded in 2025

San Francisco, CA, US

11-50 employees

https://inferact.ai

Funding

Current Stage

Early Stage

Company data provided by crunchbase