Google · 7 hours ago
Senior Staff Software Engineer, SRE, ML Fleet Systems
Google is a leading technology company, and they are seeking a Senior Staff Software Engineer in the ML Fleet Systems team. The role involves shaping the architecture and implementation of systems that ensure the scalable and efficient deployment of machine learning resources, while also tackling complex challenges that influence teams across the organization.
AppsArtificial Intelligence (AI)Cloud StorageSearch EngineSEO
Responsibilities
Define and drive the long-term technical outlook, strategy, and roadmap for critical software systems that manage Alphabet's ML fleet. This includes capacity management for all ML resources such as TPUs, GPUs, compute, storage, and networking
Act as the Technical Lead for the internal Capacity Management Business team within ML Fleet, providing technical direction, mentorship, and guidance to build and evolve our capacity management solutions from operations to robust engineered solutions
Collaborate closely with engineering partners (e.g., Onefleet, Spatial Flex, Operational Data Store (ODS)) to design and deliver joint engineered solutions to our customers
Identify, scope, and solve broad and ambiguous challenges that impact the efficiency, reliability, and cost-effectiveness of the entire ML fleet. Turn these challenges into strategic opportunities and actionable plans
Qualification
Required
Bachelor's degree in Computer Science, a related field, or equivalent practical experience
8 years of experience with software development in one or more programming languages
4 years of experience leading projects, and providing technical leadership
3 years of experience in designing, analyzing, and troubleshooting distributed systems
Preferred
Master's degree or PhD in Computer Science, or a related technical field
Experience with infrastructure optimization, performance analysis, and cost reduction in large-scale environments
Experience with colossus and other relevant Google storage systems (e.g., Bigtable, Spanner, Woodshed)
Understanding of resource management systems (e.g., Borg, Kubernetes, Flex), cluster management, and scheduling algorithms
Familiarity with Machine Learning hardware accelerators (e.g., TPUs, GPUs) and their lifecycle management
Excellent communication and collaboration skills, with the ability to build consensus across organizational boundaries
Benefits
Bonus
Equity
Benefits
Company
Google specializes in internet-related services and products, including search, advertising, and software. It is a sub-organization of Alphabet.
H1B Sponsorship
Google has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (8763)
2024 (8872)
2023 (9682)
2022 (11626)
2021 (9109)
2020 (9785)
Funding
Current Stage
Public CompanyTotal Funding
$26.1MKey Investors
Andy Bechtolsheim
2004-08-19IPO
1999-06-07Series Unknown· $25M
1998-11-01Angel· $1M
Recent News
The Indian Express
2025-10-08
2025-10-04
2025-10-04
Company data provided by crunchbase