DeepRec.ai · 2 hours ago
Senior Machine Learning Infrastructure Engineer
DeepRec.ai is building advanced AI systems with real physical capability, focusing on experimentation, engineering, and automated manufacturing. They are seeking a Senior ML Infrastructure / MLOps Engineer to design, operate, and scale the infrastructure that powers large model development, shaping the training, fine-tuning, and deployment infrastructure across various models.
Responsibilities
Build and maintain scalable infrastructure for training, fine tuning and distributed ML workflows
Develop dataset pipelines, versioning systems, experiment tracking and reproducibility frameworks
Operate containerized training and inference environments, including CI/CD for models and evaluation tooling
Partner closely with researchers, RL teams, data engineering and systems engineers to support rapid iteration and robust deployment
Qualification
Required
Strong experience in ML infrastructure, distributed training, experiment management or production ML systems
Comfort with containerization, orchestration, dataset governance and model evaluation pipelines
Ability to design reliable, high throughput training and deployment workflows
Someone who enjoys working across ML, infra and data systems in a fast moving research environment