Machine Learning Engineer — Training Optimization jobs in United States
cer-icon
Apply on Employer Site
company-logo

Featherless AI · 16 hours ago

Machine Learning Engineer — Training Optimization

FeatherlessAI is seeking a Machine Learning Engineer focused on training optimization to enhance large-scale model training. The role involves optimizing training pipelines and collaborating closely with researchers to improve model architecture and capabilities.

Artificial Intelligence (AI)Cloud ComputingDatabase
check
H1B Sponsor Likelynote

Responsibilities

Optimize large-scale model training pipelines (throughput, convergence, stability, and cost)
Improve distributed training strategies (data, model, and pipeline parallelism)
Tune optimizers, schedulers, batch sizing, and precision (bf16 / fp16 / fp8)
Reduce training time and compute cost via profiling, bottleneck analysis, and systems-level improvements
Collaborate with researchers on architecture-aware training strategies
Build and maintain robust training infrastructure (checkpointing, fault tolerance, reproducibility)
Evaluate and integrate new training techniques (e.g. gradient checkpointing, ZeRO, FSDP, custom kernels)
Own training performance metrics and continuously push them forward

Qualification

Training optimizationLarge neural networksPyTorchDistributed systemsBackpropagationOptimization algorithmsTraining dynamicsCollaborationProblem-solvingAdaptability

Required

Strong experience training large neural networks (LLMs or similarly large models)
Hands-on experience with training optimization (not just model usage)
Solid understanding of: Backpropagation, optimization algorithms, and training dynamics
Solid understanding of: Distributed systems for ML training
Experience with PyTorch (required)
Comfort working close to hardware (GPUs, memory, networking constraints)
Ability to move fluidly between research ideas and production-ready code

Preferred

Experience with large-scale distributed training (multi-node, multi-GPU)
Familiarity with DeepSpeed, FSDP, Megatron, or custom training stacks
Experience optimizing training on AMD or NVIDIA GPUs
Contributions to open-source ML infrastructure or research codebases
Exposure to non-Transformer architectures (RNNs, hybrid models, etc.)

Benefits

Competitive compensation
Meaningful equity

Company

Featherless AI

twittertwittertwitter
company-logo
We enable serverless inference via our GPU orchestration and model load-balancing system.

H1B Sponsorship

Featherless AI has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1)

Funding

Current Stage
Early Stage
Total Funding
$5M
Key Investors
Airbus Ventures
2025-10-31Series A
2025-03-17Seed· $5M
Company data provided by crunchbase