Inferact · 1 day ago
Member of Technical Staff, Performance and Scale
Inferact is dedicated to advancing AI progress with vLLM, aiming to make AI inference cheaper and faster. They are seeking an infrastructure engineer to design and implement distributed systems that enable vLLM to serve models across thousands of accelerators with minimal latency and maximum reliability.
Computer Software
Responsibilities
Design and implement the foundational layers that enable vLLM to serve models across thousands of accelerators with minimal latency and maximum reliability
Qualification
Required
Bachelor's degree or equivalent experience in computer science, engineering, or similar
Strong systems programming skills in Rust, Go, or C++
Experience designing and building high-performance distributed systems at scale
Understanding of network protocols and high-performance I/O
Ability to debug complex distributed systems issues
Preferred
Experience with ML serving infrastructure and disaggregated inference architecture
Familiarity with GPU programming models and memory hierarchies
Knowledge of GPU interconnects (NVLink, InfiniBand, RoCE) and their performance characteristics
Track record of improving system reliability and performance at scale
Benefits
Generous health, dental, and vision benefits
401(k) company match
Company
Inferact
Inferact is a startup founded by creators and core maintainers of vLLM, the most popular open-source LLM inference engine.
Funding
Current Stage
Early StageCompany data provided by crunchbase