Inference Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

adaption · 3 days ago

Inference Engineer

Adaption is a company focused on building efficient intelligence that evolves in real-time. They are seeking an Inference Engineer to set up core inference systems for their product, including deployment pipelines and model serving, while thriving in a zero-to-one environment.

Computer Software

Responsibilities

Build from zero to one: design and implement our entire LLM inference infrastructure, making critical architectural decisions for scalability and performance
Own the inference stack: deploy, optimize, and maintain high-throughput, low-latency inference systems serving millions of requests
Framework expertise: leverage frameworks like vLLM, SGLang, or similar to maximize inference efficiency and cost-effectiveness
Performance optimization: fine-tune model serving configurations, implement batching strategies, and optimize GPU utilization
Infrastructure scaling: design auto-scaling systems that can handle variable traffic patterns while controlling costs
Monitoring & reliability: build comprehensive observability into our inference pipeline with proper alerting and incident response
Cross-functional collaboration: work closely with our ML and product teams to understand requirements and deliver optimal serving solutions

Qualification

LLM inference systemsInference frameworksDistributed systemsContainerizationCloud platformsPerformance optimizationProduction experienceSoft skills

Required

Proven 0→1 experience: You've previously built LLM inference systems from scratch in a production environment
Framework proficiency: Hands-on experience with modern inference frameworks (vLLM, SGLang, TensorRT-LLM, or similar)
Infrastructure expertise: Strong background in distributed systems, containerization (Docker/Kubernetes), and cloud platforms (AWS/GCP/Azure)
Performance mindset: Experience optimizing inference latency, throughput, and cost at scale
Production experience: You've deployed and maintained ML systems serving real users in production

Preferred

Experience in a fast-paced startup environment
Contributions to open-source inference tools and frameworks
Experience with model quantization, pruning, or other optimization techniques
Knowledge of CUDA programming and GPU optimization
Experience serving multi-modal models (vision, audio, etc.)

Benefits

Flexible work: In-person collaboration in the Bay Area, a distributed global-first team, and quarterly offsites.
Adaption Passport: Annual travel stipend to explore a country you've never visited. We're building intelligence that evolves alongside you, so we encourage you to keep expanding your horizons.
Lunch Stipend: Weekly meal allowance for take-out or grocery delivery.
Well-Being: Comprehensive medical benefits and generous paid time off.

Company

adaption

twitter
company-logo

Funding

Current Stage
Early Stage
Company data provided by crunchbase