AI Systems & Inference Frameworks Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

adaption · 10 hours ago

AI Systems & Inference Frameworks Engineer

Adaption is a company focused on building efficient AI systems that evolve in real-time. The AI Systems & Inference Frameworks Engineer will work directly with the founders to design and build the inference and optimization systems for their core product, bridging research and production while owning the lifecycle of LLM inference.

Computer Software

Responsibilities

Inference Research & Systems: design and build our LLM inference stack from zero to one, exploring and implementing advanced techniques for low-latency, high-throughput serving of language and multimodal models
Frameworks & Optimization: develop and optimize inference using modern frameworks (e.g., vLLM, SGLang, TensorRT-LLM), experimenting with batching strategies, KV-cache management, parallelism, and GPU utilization to push performance and cost efficiency
Software–Hardware Co-Design: collaborate closely with founders and model developers to analyze bottlenecks across the stack, co-optimizing model execution, infrastructure, and deployment pipelines

Qualification

LLM inference systemsInference frameworksGPU optimizationPython programmingCUDAC++Model quantizationMultimodal inferenceOpen-source contributions

Required

Strong experience building and optimizing LLM inference systems in production or research environments
Hands-on expertise with inference frameworks such as vLLM, SGLang, TensorRT-LLM, or similar
Deep performance mindset with experience in GPU-backed systems, latency/throughput optimization, and resource efficiency
Solid understanding of transformer inference, serving architectures, and KV-cache–based execution
Strong programming skills in Python; experience with CUDA, Triton, or C++ a plus
Comfort working in ambiguous, zero-to-one environments and driving research ideas into production systems

Preferred

Experience with model quantization or pruning
Speculative decoding
Multimodal inference
Open-source contributions
Prior work in systems or ML research labs

Benefits

Flexible work: In-person collaboration in the Bay Area, a distributed global-first team, and quarterly offsites.
Adaption Passport: Annual travel stipend to explore a country you've never visited. We're building intelligence that evolves alongside you, so we encourage you to keep expanding your horizons.
Lunch Stipend: Weekly meal allowance for take-out or grocery delivery.
Well-Being: Comprehensive medical benefits and generous paid time off.

Company

adaption

twitter
company-logo

Funding

Current Stage
Early Stage
Company data provided by crunchbase