Apply on Employer Site

adaption · 10 hours ago

AI Systems & Inference Frameworks Engineer

New York, NY

Full-time

Hybrid

Mid, Senior Level

Adaption is a company focused on building efficient AI systems that evolve in real-time. The AI Systems & Inference Frameworks Engineer will work directly with the founders to design and build the inference and optimization systems for their core product, bridging research and production while owning the lifecycle of LLM inference.

Computer Software

Responsibilities

Inference Research & Systems: design and build our LLM inference stack from zero to one, exploring and implementing advanced techniques for low-latency, high-throughput serving of language and multimodal models

Frameworks & Optimization: develop and optimize inference using modern frameworks (e.g., vLLM, SGLang, TensorRT-LLM), experimenting with batching strategies, KV-cache management, parallelism, and GPU utilization to push performance and cost efficiency

Software–Hardware Co-Design: collaborate closely with founders and model developers to analyze bottlenecks across the stack, co-optimizing model execution, infrastructure, and deployment pipelines

Qualification

LLM inference systemsInference frameworksGPU optimizationPython programmingCUDAC++Model quantizationMultimodal inferenceOpen-source contributions

Required

Strong experience building and optimizing LLM inference systems in production or research environments

Hands-on expertise with inference frameworks such as vLLM, SGLang, TensorRT-LLM, or similar

Deep performance mindset with experience in GPU-backed systems, latency/throughput optimization, and resource efficiency

Solid understanding of transformer inference, serving architectures, and KV-cache–based execution

Strong programming skills in Python; experience with CUDA, Triton, or C++ a plus

Comfort working in ambiguous, zero-to-one environments and driving research ideas into production systems

Preferred

Experience with model quantization or pruning

Speculative decoding

Multimodal inference

Open-source contributions

Prior work in systems or ML research labs

Benefits

Flexible work: In-person collaboration in the Bay Area, a distributed global-first team, and quarterly offsites.

Adaption Passport: Annual travel stipend to explore a country you've never visited. We're building intelligence that evolves alongside you, so we encourage you to keep expanding your horizons.

Lunch Stipend: Weekly meal allowance for take-out or grocery delivery.

Well-Being: Comprehensive medical benefits and generous paid time off.

Company

adaption

San Francisco, US

2-10 employees

https://adaptionlabs.ai

Funding

Current Stage

Early Stage

Company data provided by crunchbase