Senior MLOps Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

DeepRec.ai · 2 hours ago

Senior MLOps Engineer

DeepRec.ai is building AI-native systems at the intersection of machine learning, scientific computing, and materials innovation. They are seeking a Senior MLOps Engineer to own and operate a production-grade GPU platform supporting large-scale model training and low-latency inference for computational chemistry and LLM workloads.

Hiring Manager
Sam Warwick
linkedin

Responsibilities

Own and operate a production-grade GPU platform supporting large-scale model training and low-latency inference for computational chemistry and LLM workloads serving thousands of users
Hold end-to-end responsibility for the ML platform, spanning Kubernetes-based GPU orchestration, cloud infrastructure and Infrastructure-as-Code, ML pipelines, CI/CD, observability, reliability, and disaster recovery
Design and operate hardened, multi-tenant ML systems on AWS, build and optimize high-performance inference stacks using vLLM and TensorRT-based runtimes, and drive measurable improvements in latency, throughput, and GPU utilization through batching, caching, quantization, and kernel-level optimizations
Establish SLO-driven operational standards, robust monitoring and alerting, on-call readiness, and repeatable release and rollback workflows
Work closely with research scientists and product teams to reliably productionize models, support distributed training and inference across multi-node GPU clusters, and ensure high-throughput data pipelines for large scientific datasets

Qualification

MLOpsKubernetesAWSPythonTerraformCI/CDGPU orchestrationML pipelinesDisaster recoveryObservabilityReliabilitySoft skills

Required

5+ years of experience in MLOps, platform, or infrastructure engineering
Deep hands-on experience running GPU workloads on Kubernetes, including scheduling, autoscaling, multi-tenancy, and debugging GPU runtime issues
Strong Terraform and cloud-native fundamentals
Strong proficiency in Python
Proven track record of operating scalable, high-performance ML systems in production
Experience supporting scientific, computational chemistry, or other physics-based workloads
Prior exposure to large-scale LLM serving
Experience with distributed training frameworks
Experience in regulated production environments

Company

DeepRec.ai

twitter
company-logo
We are your Deep Tech recruitment specialists, driven by a mission to power progress in the world’s most exciting industries.

Funding

Current Stage
Growth Stage

Leadership Team

leader-logo
Hayley Killengrey
Co-Founder and Managing Director USA
linkedin
Company data provided by crunchbase