Apply on Employer Site

adaption · 3 days ago

Inference Engineer

San Francisco, CA

Full-time

Hybrid

Senior Level

Adaption is a company focused on building efficient intelligence that evolves in real-time. They are seeking an Inference Engineer to set up core inference systems for their product, including deployment pipelines and model serving, while thriving in a zero-to-one environment.

Computer Software

Responsibilities

Build from zero to one: design and implement our entire LLM inference infrastructure, making critical architectural decisions for scalability and performance

Own the inference stack: deploy, optimize, and maintain high-throughput, low-latency inference systems serving millions of requests

Framework expertise: leverage frameworks like vLLM, SGLang, or similar to maximize inference efficiency and cost-effectiveness

Performance optimization: fine-tune model serving configurations, implement batching strategies, and optimize GPU utilization

Infrastructure scaling: design auto-scaling systems that can handle variable traffic patterns while controlling costs

Monitoring & reliability: build comprehensive observability into our inference pipeline with proper alerting and incident response

Cross-functional collaboration: work closely with our ML and product teams to understand requirements and deliver optimal serving solutions

Qualification

LLM inference systemsInference frameworksDistributed systemsContainerizationCloud platformsPerformance optimizationProduction experienceSoft skills

Required

Proven 0→1 experience: You've previously built LLM inference systems from scratch in a production environment

Framework proficiency: Hands-on experience with modern inference frameworks (vLLM, SGLang, TensorRT-LLM, or similar)

Infrastructure expertise: Strong background in distributed systems, containerization (Docker/Kubernetes), and cloud platforms (AWS/GCP/Azure)

Performance mindset: Experience optimizing inference latency, throughput, and cost at scale

Production experience: You've deployed and maintained ML systems serving real users in production

Preferred

Experience in a fast-paced startup environment

Contributions to open-source inference tools and frameworks

Experience with model quantization, pruning, or other optimization techniques

Knowledge of CUDA programming and GPU optimization

Experience serving multi-modal models (vision, audio, etc.)

Benefits

Flexible work: In-person collaboration in the Bay Area, a distributed global-first team, and quarterly offsites.

Adaption Passport: Annual travel stipend to explore a country you've never visited. We're building intelligence that evolves alongside you, so we encourage you to keep expanding your horizons.

Lunch Stipend: Weekly meal allowance for take-out or grocery delivery.

Well-Being: Comprehensive medical benefits and generous paid time off.

Company

adaption

San Francisco, US

2-10 employees

https://adaptionlabs.ai

Funding

Current Stage

Early Stage

Company data provided by crunchbase