Senior ML Storage Infrastructure Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Zoox · 2 weeks ago

Senior ML Storage Infrastructure Engineer

Zoox is developing the first ground-up, fully autonomous vehicle fleet and the supporting ecosystem required to bring this technology to market. They are seeking a Senior ML Storage Infrastructure Engineer to work on custom High-Performance Computing infrastructure that supports machine learning workflows across various software divisions. The role involves designing and optimizing storage infrastructure, driving GPU efficiency, and creating essential tools for software teams.

Artificial Intelligence (AI)Autonomous VehiclesMachine LearningRoboticsTransportation
check
H1B Sponsor Likelynote

Responsibilities

Design, build, and optimize a petabyte-scale, in-house HPC storage infrastructure, ensuring high performance and reliability for our machine learning workloads across both cloud and on-premise data centers
Drive GPU efficiency by strategically collocating storage and compute, architecting a storage layer that keeps tens of thousands of GPUs fully utilized and prevents bottlenecks
Drive key initiatives in training and storage optimization by partnering with ML practitioners, applying your deep understanding of frameworks such as PyTorch and TensorFlow to meet their evolving demands
Investigate and adopt new distributed system paradigms and cutting-edge technologies to ensure our infrastructure can scale to meet ever-growing computational and storage demands
Create production-grade web service APIs, SDKs, and other essential tools to deliver a world-class developer experience for all software teams at Zoox

Qualification

Distributed storage systemsHigh-performance computingCloud platformsMachine learning frameworksPythonJavaParallel filesystemsKubernetesSoft skills

Required

Experience designing and building high-performance, distributed storage systems (object/file) for large-scale, GPU-bound workloads
Proficiency in Python, Java, or similar languages for developing data-intensive, high-performance applications
Hands-on experience with cloud platforms (AWS, GCP, Azure), using their storage, GPU, and observability services to provide usage showback for ML practitioners
Bachelor's degree in Computer Science or a related field with a strong foundation in data structures and systems design

Preferred

Experience with parallel filesystems (e.g., Lustre, FSx) and their integration with container orchestrators via Kubernetes CSI drivers
Deep knowledge of ML frameworks like PyTorch and TensorFlow, and workload schedulers such as SLURM or Kubernetes
Familiarity with emerging AI paradigms, including agentic systems, and observability tools like OpenTelemetry

Benefits

Paid time off (e.g. sick leave, vacation, bereavement)
Unpaid time off
Zoox Stock Appreciation Rights
Amazon RSUs
Health insurance
Long-term care insurance
Long-term and short-term disability insurance
Life insurance

Company

Zoox is an AI robotics company that provides mobility-as-a-service and self-driving car services. It is a sub-organization of Amazon.

H1B Sponsorship

Zoox has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (293)
2024 (297)
2023 (209)
2022 (204)
2021 (131)
2020 (83)

Funding

Current Stage
Late Stage
Total Funding
$955M
Key Investors
Grok Ventures
2020-06-26Acquired
2019-10-21Convertible Note· $200M
2018-07-08Series B· $465M

Leadership Team

leader-logo
Nelson Pedreiro
Sr. Vice President, Hardware
linkedin
Z
Zheng Gao
Director of Hardware Engineering
linkedin
Company data provided by crunchbase