Machine Learning Engineer — Inference Optimization jobs in United States
cer-icon
Apply on Employer Site
company-logo

Featherless AI · 17 hours ago

Machine Learning Engineer — Inference Optimization

FeatherlessAI is seeking a Machine Learning Engineer to optimize model inference performance at scale. The role involves profiling systems, implementing optimization techniques, and collaborating with research engineers to enhance production-grade performance.

Artificial Intelligence (AI)Cloud ComputingDatabase
check
H1B Sponsor Likelynote

Responsibilities

Optimize inference latency, throughput, and cost for large-scale ML models in production
Profile and bottleneck GPU/CPU inference pipelines (memory, kernels, batching, IO)
Implement and tune techniques such as:
Quantization (fp16, bf16, int8, fp8)
KV-cache optimization & reuse
Speculative decoding, batching, and streaming
Model pruning or architectural simplifications for inference
Collaborate with research engineers to productionize new model architectures
Build and maintain inference-serving systems (e.g. Triton, custom runtimes, or bespoke stacks)
Benchmark performance across hardware (NVIDIA / AMD GPUs, CPUs) and cloud setups
Improve system reliability, observability, and cost efficiency under real workloads

Qualification

ML inference optimizationDeep learning internalsPyTorchGPU performance tuningInference frameworksScaling inferenceOpen-source contributionsDistributed systemsLow-latency services

Required

Strong experience in ML inference optimization or high-performance ML systems
Solid understanding of deep learning internals (attention, memory layout, compute graphs)
Hands-on experience with PyTorch (or similar) and model deployment
Familiarity with GPU performance tuning (CUDA, ROCm, Triton, or kernel-level optimizations)
Experience scaling inference for real users (not just research benchmarks)
Comfortable working in fast-moving startup environments with ownership and ambiguity

Preferred

Experience with LLM or long-context model inference
Knowledge of inference frameworks (TensorRT, ONNX Runtime, vLLM, Triton)
Experience optimizing across different hardware vendors
Open-source contributions in ML systems or inference tooling
Background in distributed systems or low-latency services

Benefits

Competitive compensation + meaningful equity at Series A

Company

Featherless AI

twittertwittertwitter
company-logo
We enable serverless inference via our GPU orchestration and model load-balancing system.

H1B Sponsorship

Featherless AI has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1)

Funding

Current Stage
Early Stage
Total Funding
$5M
Key Investors
Airbus Ventures
2025-10-31Series A
2025-03-17Seed· $5M
Company data provided by crunchbase