Vibotek LLC ยท 2 months ago
ML Ops Support Engineer
Vibotek LLC is seeking an MLOps L2 Support Engineer to provide 24/7 production support for machine learning and data pipelines. The role involves troubleshooting ML workflows and ensuring high availability of ML models in production environments.
Information Technology & Services
Responsibilities
Provide L2 support for MLOps production environments, ensuring uptime and reliability
Troubleshoot ML pipelines, data processing jobs, and API issues
Monitor logs, alerts, and performance metrics using Dataiku, Prometheus, Grafana, or AWS tools such CloudWatch
Perform root cause analysis (RCA) and resolve incidents within SLAs
Escalate unresolved issues to L3 engineering teams when needed
Manage Dataiku DSS workflows, troubleshoot job failures, and optimize performance
Monitor and support Dataiku plugins, APIs, and automation scenarios
Collaborate with Data Scientists and Data Engineers to debug ML model deployments
Perform version control and CI/CD integration for Dataiku projects
Support CI/CD pipelines for ML model deployment (Bamboo, Bitbucket etc)
Deploy ML models and data pipelines using Docker, Kubernetes, or Dataiku Flow
Automate monitoring and alerting for ML model drift, data quality, and performance
Monitor AWS-based ML workloads (SageMaker, Lambda, ECS, S3, RDS)
Manage storage and compute resources for ML workflows
Support database connections, data ingestion, and ETL pipelines (SQL, Spark, Kafka)
Ensure secure access control for ML models and data pipelines
Support audit, compliance, and governance for Dataiku and MLOps workflows
Respond to security incidents related to ML models and data access
Qualification
Required
5+ years in MLOps, Data Engineering, or Production Support
Strong experience in Dataiku workflows, scenarios, plugins, and APIs
Hands-on experience with AWS ML services (SageMaker, Lambda, S3, RDS, ECS, IAM)
Familiarity with GitHub Actions, Jenkins, or Terraform
Proficiency in Python, Bash, SQL for automation & debugging
Experience with Prometheus, Grafana, CloudWatch, or ELK Stack
Ability to handle on-call support, weekend shifts, and SLA-based issue resolution
Preferred
Experience with Docker, Kubernetes, or OpenShift
Familiarity with TensorFlow Serving, MLflow, or Dataiku Model API
Experience with Spark, Databricks, Kafka, or Snowflake
ITIL Foundation, AWS ML certifications; Dataiku certification
Company
Vibotek LLC
We screen and shortlist candidates before presenting to our clients. Therefore reducing hiring time and cost.
Funding
Current Stage
Early StageCompany data provided by crunchbase