Machine Learning Infrastructure Engineers jobs in United States
cer-icon
Apply on Employer Site
company-logo

Shopify · 5 months ago

Machine Learning Infrastructure Engineers

Shopify is a company that empowers entrepreneurs and enterprises to reach their potential. The Machine Learning Infrastructure Engineer will build and operate the platform that powers AI, focusing on high-performance systems and enhancing the developer experience for ML teams.

E-CommerceE-Commerce PlatformsEnterprise SoftwareSaaS
check
H1B Sponsor Likelynote

Responsibilities

Build and operate ML control planes, APIs, CLIs, and self-serve golden paths
Design and optimize multi-tenant GPU Kubernetes clusters, including autoscaling, scheduling, packing, and utilization
Own model lifecycle: training orchestration/experiments, registries/versioning, CI/CD, canary/blue-green, and safe rollback
Build real-time serving stacks (KServe/Seldon/TensorFlow Serving) and end-to-end pipelines for batch and streaming
Design feature platforms and engineer storage/data movement for datasets, features, and artifacts tuned for cost/performance
Implement observability and SLOs across pipelines, training, and inference; automate remediation and capacity planning
Partner with ML, data, and product teams to unblock delivery and accelerate idea-to-impact

Qualification

Kubernetes expertiseGPU infrastructure experienceInfrastructure-as-codeProficient in Python/Go/JavaObservability expertiseDistributed systems designModel lifecycle toolingData infrastructure familiaritySoft skills

Required

Proven platform/infrastructure engineering experience with a track record of shipping production systems and code
Deep Kubernetes/containerization expertise for ML workloads (operators, Helm, service mesh/gRPC) and multi-tenant clusters
Hands-on experience running GPU infrastructure at scale (NVIDIA ecosystem; scheduling/packing/optimization)
Strong distributed systems and API/service design fundamentals; experience with high-scale inference
Proficiency with infrastructure-as-code and automation (Terraform, Helm, GitOps) on major clouds (GCP/AWS/Azure)
Observability expertise (Prometheus/Grafana) and SLO-driven operations for ML systems
Proficient in Python/Go/Java; experience building developer tooling and self-service platforms

Preferred

Model serving and lifecycle tooling: KServe/Seldon/TensorFlow Serving, Kubeflow, MLflow/W&B, model registries, DVC
Feature store experience (Feast/Tecton) with online/offline parity and SLAs
Data infrastructure familiarity (Kafka, Spark/Flink) and stateful stores (Redis/MySQL); CI/CD for online/batch inference
Model performance optimization (batching, caching, quantization, distillation) and hardware-aware tuning
Experience with experimentation/A/B testing platforms and online evaluation frameworks

Company

Shopify is a cloud-based, multi-channel commerce platform designed for small and medium-sized businesses.

H1B Sponsorship

Shopify has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (57)
2024 (37)
2023 (9)
2022 (84)
2021 (41)
2020 (13)

Funding

Current Stage
Public Company
Total Funding
$122.25M
Key Investors
Bessemer Venture PartnersKlister Credit
2015-05-21IPO
2013-12-11Series C· $100M
2011-10-17Series B· $15M

Leadership Team

leader-logo
Tobias Lütke
CEO
linkedin
leader-logo
Mikhail Parakhin
Chief Technology Officer
linkedin
Company data provided by crunchbase