Staff Software Engineer, ML Systems jobs in United States
cer-icon
Apply on Employer Site
company-logo

Waymo · 2 days ago

Staff Software Engineer, ML Systems

Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver. They are seeking a Staff-level engineer to bridge the gap between Machine Learning Product teams and Core Infrastructure, focusing on optimizing ML systems for efficiency and reliability.

Artificial Intelligence (AI)AutomotiveAutonomous VehiclesSensorTransportation
check
H1B Sponsor Likelynote

Responsibilities

Partner with Core Infrastructure teams to influence the roadmap for compute, storage, and scheduling. Translate the ML team’s product requirements into concrete infrastructure requests (e.g., accelerator topology needs, storage throughput requirements) and prioritize them effectively
Serve as the technical escalation point for production blockers. Troubleshoot complex failures that span the stack—from Python-level OOM errors in training jobs to underlying cluster scheduling, containerization, or network latency issues. Instrument modules to actively monitor and recover from instability
Profile and optimize end-to-end pipelines. Identify bottlenecks in data loading (e.g., fetching data from distributed storage to accelerators) and implement C++ optimizations where Python overhead is too high
Build robust CLI tools and middleware to improve the 'inner loop' of ML development. Automate tedious tasks in remote development environments to simplify change management and validation
Write the necessary shims and wrappers to integrate new Core Infra features into the ML stack before they are officially supported, allowing the team to move faster than the platform baseline

Qualification

C++PythonDistributed SystemsDebuggingMachine LearningTechnical ConsensusStakeholder ManagementLeadershipMentoring

Required

Strong proficiency in C++ (system performance, concurrency, memory management) and Python (ML modeling, scripting). Ability to write production-ready code in a large-scale monorepo environment
Deep understanding of resource management (CPU/RAM/Accelerator isolation), job scheduling, RPC subsystems, and distributed storage
Fearless approach to debugging 'black box' system issues. Comfortable using profilers and digging into system logs to diagnose contention, deadlocks, or memory leaks
Demonstrated ability to drive technical consensus across teams. Experience defining engineering standards, mentoring senior engineers, and managing complex stakeholder relationships

Preferred

Experience with custom/proprietary build systems (e.g., Bazel-like environments)
Experience optimizing large-scale ML workloads on custom AI accelerators (TPUs/GPUs)
Familiarity with low-level serialization formats (e.g., Protocol Buffers)
Background in optimizing remote development environments or interactive notebook workflows

Benefits

Waymo’s discretionary annual bonus program
Equity incentive plan
Generous Company benefits program

Company

Waymo is a mobility technology company that improves transportation by developing self-driving solutions for travelers and daily commuters. It is a sub-organization of Alphabet.

H1B Sponsorship

Waymo has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (231)
2024 (175)
2023 (268)
2022 (306)
2021 (298)
2020 (317)

Funding

Current Stage
Late Stage
Total Funding
$11.1B
Key Investors
Alphabet
2024-07-23Series C· $5.6B
2021-06-16Series B· $2.5B
2020-05-12Series A· $750M

Leadership Team

leader-logo
Tekedra Mawakana
Co-Chief Executive Officer
linkedin
leader-logo
Elisa de Martel
Chief Financial Officer
linkedin
Company data provided by crunchbase