AI Researcher — Inference Optimization jobs in United States
cer-icon
Apply on Employer Site
company-logo

Featherless AI · 23 hours ago

AI Researcher — Inference Optimization

FeatherlessAI is seeking an AI Researcher with deep experience in inference optimization to design, evaluate, and deploy high-performance inference systems for large-scale machine learning models. The role involves improving latency, throughput, and cost efficiency across real-world production environments by developing techniques to optimize inference performance and collaborating with engineering teams to deploy optimized pipelines.

Artificial Intelligence (AI)Cloud ComputingDatabase
check
H1B Sponsor Likelynote

Responsibilities

Research and develop techniques to optimize inference performance for large neural networks
Improve latency, throughput, memory efficiency, and cost per inference
Design and evaluate model-level optimizations (quantization, pruning, KV-cache optimization, architecture-aware simplifications)
Implement systems-level optimizations (dynamic batching, kernel fusion, multi-GPU inference, prefill vs decode optimization)
Benchmark inference workloads across hardware accelerators
Collaborate with engineering teams to deploy optimized inference pipelines
Translate research insights into production-ready improvements

Qualification

Inference optimizationMachine learningDeep learningPythonPyTorchInference toolingExperiment designOpen-source contributionResearch publicationHardware experienceCommunication skills

Required

Strong background in machine learning, deep learning, or AI systems
Hands-on experience optimizing inference for large-scale models
Proficiency in Python and modern ML frameworks (e.g., PyTorch)
Experience with inference tooling (e.g., Triton, TensorRT, vLLM, ONNX Runtime)
Ability to design experiments and communicate results clearly

Preferred

Experience deploying production inference systems at scale
Familiarity with distributed and multi-GPU inference
Experience contributing to open-source ML or inference frameworks
Authorship or co-authorship of peer-reviewed research papers in machine learning, systems, or related fields
Experience working close to hardware (CUDA, ROCm, profiling tools)

Company

Featherless AI

twittertwittertwitter
company-logo
We enable serverless inference via our GPU orchestration and model load-balancing system.

H1B Sponsorship

Featherless AI has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1)

Funding

Current Stage
Early Stage
Total Funding
$5M
Key Investors
Airbus Ventures
2025-10-31Series A
2025-03-17Seed· $5M
Company data provided by crunchbase