adaption · 3 days ago
Inference Engineer
Adaption is a company focused on building efficient intelligence that evolves in real-time. They are seeking an Inference Engineer to set up core inference systems for their product, including deployment pipelines and model serving, while thriving in a zero-to-one environment.
Computer Software
Responsibilities
Build from zero to one: design and implement our entire LLM inference infrastructure, making critical architectural decisions for scalability and performance
Own the inference stack: deploy, optimize, and maintain high-throughput, low-latency inference systems serving millions of requests
Framework expertise: leverage frameworks like vLLM, SGLang, or similar to maximize inference efficiency and cost-effectiveness
Performance optimization: fine-tune model serving configurations, implement batching strategies, and optimize GPU utilization
Infrastructure scaling: design auto-scaling systems that can handle variable traffic patterns while controlling costs
Monitoring & reliability: build comprehensive observability into our inference pipeline with proper alerting and incident response
Cross-functional collaboration: work closely with our ML and product teams to understand requirements and deliver optimal serving solutions
Qualification
Required
Proven 0→1 experience: You've previously built LLM inference systems from scratch in a production environment
Framework proficiency: Hands-on experience with modern inference frameworks (vLLM, SGLang, TensorRT-LLM, or similar)
Infrastructure expertise: Strong background in distributed systems, containerization (Docker/Kubernetes), and cloud platforms (AWS/GCP/Azure)
Performance mindset: Experience optimizing inference latency, throughput, and cost at scale
Production experience: You've deployed and maintained ML systems serving real users in production
Preferred
Experience in a fast-paced startup environment
Contributions to open-source inference tools and frameworks
Experience with model quantization, pruning, or other optimization techniques
Knowledge of CUDA programming and GPU optimization
Experience serving multi-modal models (vision, audio, etc.)
Benefits
Flexible work: In-person collaboration in the Bay Area, a distributed global-first team, and quarterly offsites.
Adaption Passport: Annual travel stipend to explore a country you've never visited. We're building intelligence that evolves alongside you, so we encourage you to keep expanding your horizons.
Lunch Stipend: Weekly meal allowance for take-out or grocery delivery.
Well-Being: Comprehensive medical benefits and generous paid time off.
Company
adaption
Funding
Current Stage
Early StageCompany data provided by crunchbase