Member of Technical Staff, Performance and Scale jobs in United States
cer-icon
Apply on Employer Site
company-logo

Inferact · 1 day ago

Member of Technical Staff, Performance and Scale

Inferact is dedicated to advancing AI progress with vLLM, aiming to make AI inference cheaper and faster. They are seeking an infrastructure engineer to design and implement distributed systems that enable vLLM to serve models across thousands of accelerators with minimal latency and maximum reliability.

Computer Software
check
H1B Sponsorednote

Responsibilities

Design and implement the foundational layers that enable vLLM to serve models across thousands of accelerators with minimal latency and maximum reliability

Qualification

RustGoC++Distributed systemsNetwork protocolsHigh-performance I/OML serving infrastructureGPU programmingGPU interconnectsDebugging complex systemsSystem reliabilityPerformance improvement

Required

Bachelor's degree or equivalent experience in computer science, engineering, or similar
Strong systems programming skills in Rust, Go, or C++
Experience designing and building high-performance distributed systems at scale
Understanding of network protocols and high-performance I/O
Ability to debug complex distributed systems issues

Preferred

Experience with ML serving infrastructure and disaggregated inference architecture
Familiarity with GPU programming models and memory hierarchies
Knowledge of GPU interconnects (NVLink, InfiniBand, RoCE) and their performance characteristics
Track record of improving system reliability and performance at scale

Benefits

Generous health, dental, and vision benefits
401(k) company match

Company

Inferact

twitter
company-logo
Inferact is a startup founded by creators and core maintainers of vLLM, the most popular open-source LLM inference engine.

Funding

Current Stage
Early Stage
Company data provided by crunchbase