Senior Machine Learning Infrastructure Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

DeepRec.ai · 1 week ago

Senior Machine Learning Infrastructure Engineer

DeepRec.ai is an early-stage AI company focused on building foundation models for physics to support industrial automation. They are seeking a Senior Machine Learning Infrastructure Engineer to manage the full ML infrastructure stack, working closely with the founding team to build and scale production-grade ML systems.

Hiring Manager
Sam Warwick
linkedin

Responsibilities

Own distributed training and fine-tuning infrastructure across multi-GPU and multi-node clusters
Design and operate low-latency, highly reliable inference and model serving systems
Build secure fine-tuning pipelines allowing customers to adapt models to their data and workflows
Deliver deployments across cloud and on-prem environments, including enterprise and air-gapped setups
Design data pipelines for large-scale simulation and CFD datasets
Implement observability, monitoring, and debugging across training, serving, and data pipelines
Work directly with customers on deployment, integration, and scaling challenges
Move quickly from prototype to production infrastructure

Qualification

ML infrastructureAWSKubernetesPythonDistributed trainingDockerInfrastructure-as-codeModel serving systemsDebuggingCustomer-facing experience

Required

3+ years building and scaling ML infrastructure for training, fine-tuning, serving, or deployment
Strong experience with AWS, GCP, or Azure
Hands-on expertise with Kubernetes, Docker, and infrastructure-as-code
Experience with distributed training frameworks such as PyTorch Distributed, DeepSpeed, or Ray
Proven experience building production-grade inference systems
Strong Python skills and deep understanding of the end-to-end ML lifecycle
High execution velocity, strong debugging instincts, and comfort operating in ambiguity

Preferred

Background in physics, simulation, or computer-aided engineering software
Experience deploying ML systems into enterprise or regulated environments
Foundation model fine-tuning infrastructure experience
GPU performance optimization experience (CUDA, Triton, etc.)
Large-scale ML data engineering and validation pipelines
Experience at high-growth AI startups or leading AI research labs
Customer-facing or forward-deployed engineering experience
Open-source contributions to ML infrastructure

Benefits

Competitive compensation with meaningful equity
Strong benefits
Daily meals
Stipends
Immigration support

Company

DeepRec.ai

twitter
company-logo
We are your Deep Tech recruitment specialists, driven by a mission to power progress in the world’s most exciting industries.

Funding

Current Stage
Growth Stage

Leadership Team

leader-logo
Hayley Killengrey
Co-Founder and Managing Director USA
linkedin
Company data provided by crunchbase