Featherless AI · 17 hours ago
Machine Learning Engineer — Inference Optimization
FeatherlessAI is seeking a Machine Learning Engineer to optimize model inference performance at scale. The role involves profiling systems, implementing optimization techniques, and collaborating with research engineers to enhance production-grade performance.
Artificial Intelligence (AI)Cloud ComputingDatabase
Responsibilities
Optimize inference latency, throughput, and cost for large-scale ML models in production
Profile and bottleneck GPU/CPU inference pipelines (memory, kernels, batching, IO)
Implement and tune techniques such as:
Quantization (fp16, bf16, int8, fp8)
KV-cache optimization & reuse
Speculative decoding, batching, and streaming
Model pruning or architectural simplifications for inference
Collaborate with research engineers to productionize new model architectures
Build and maintain inference-serving systems (e.g. Triton, custom runtimes, or bespoke stacks)
Benchmark performance across hardware (NVIDIA / AMD GPUs, CPUs) and cloud setups
Improve system reliability, observability, and cost efficiency under real workloads
Qualification
Required
Strong experience in ML inference optimization or high-performance ML systems
Solid understanding of deep learning internals (attention, memory layout, compute graphs)
Hands-on experience with PyTorch (or similar) and model deployment
Familiarity with GPU performance tuning (CUDA, ROCm, Triton, or kernel-level optimizations)
Experience scaling inference for real users (not just research benchmarks)
Comfortable working in fast-moving startup environments with ownership and ambiguity
Preferred
Experience with LLM or long-context model inference
Knowledge of inference frameworks (TensorRT, ONNX Runtime, vLLM, Triton)
Experience optimizing across different hardware vendors
Open-source contributions in ML systems or inference tooling
Background in distributed systems or low-latency services
Benefits
Competitive compensation + meaningful equity at Series A
Company
Featherless AI
We enable serverless inference via our GPU orchestration and model load-balancing system.
H1B Sponsorship
Featherless AI has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1)
Funding
Current Stage
Early StageTotal Funding
$5MKey Investors
Airbus Ventures
2025-10-31Series A
2025-03-17Seed· $5M
Company data provided by crunchbase