Sr. MLOps Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Lenstra ยท 5 days ago

Sr. MLOps Engineer

Lenstra is a company passionate about delivering top-quality solutions in various industry domains. As a Senior MLOps Engineer, you will build and operate the platform and tooling for identity-verification products, enabling a smooth transition from ML research to production.

Cloud ComputingCloud InfrastructureConsultingInformation TechnologyIT InfrastructureSoftware

Responsibilities

Run and evolve the ML compute layer on Kubernetes/EKS (CPU/GPU) for multi-tenant workloads, and make workloads portable across regions (region-aware scheduling, cross-region data access, and artifact portability)
Operate Argo Workflows and Dask Gateway as reliable, self-serve services used by engineers and researchers to orchestrate data prep, training, evaluation, and large-scale batch compute (installation, upgrades, security, quotas, autoscaling)
Build GitOps-native delivery for ML jobs and platform components (GitLab CI, Helm, FluxCD) with fast rollouts and safe rollbacks
Design and maintain the data platform built on LakeFS to enable experiment reproducibility, data lineage tracking, and automated governance processes
Own developer experience and enablement by creating clear APIs/CLIs and minimal UIs, maintaining comprehensive templates and documentation

Qualification

Kubernetes (EKS)DaskGitLab CITerraformNVIDIA TritonArgo WorkflowsWeights & BiasesLakeFSApache IcebergAWS AthenaSnowflakePythonFinOps best practicesObservability toolsSoft skills

Required

Experience with distributed compute frameworks such as Dask, Spark, or Ray
Familiarity with NVIDIA Triton or other inference servers
FinOps best practices and cost attribution for multi-tenant ML infrastructure
Exposure to multi-region designs (dataset replication strategies, compute placement, and latency optimization)
Container Orchestration: Kubernetes (EKS)
Compute: Argo Workflows for orchestration and Dask for Distributed Computing
ML Experiment Tracking: Weights & Biases
Data (Lakehouse & Versioning): Apache Iceberg + AWS Athena, LakeFS, Snowflake
CI/CD & GitOps: GitLab CI, Helm, FluxCD
Infrastructure as Code: Terraform
Observability: Prometheus/Grafana, Loki/Promtail, Datadog, Sentry
Languages & Libraries: Python (Django, FastAPI, Pydantic, boto3)

Company

Lenstra

twittertwitter
company-logo
Tech consulting services

Funding

Current Stage
Growth Stage
Company data provided by crunchbase