Google DeepMind · 2 months ago
Technical Program Manager, RL Infrastructure & Reliability
DeepMind is focused on advancing artificial intelligence and is seeking a Technical Program Manager for Reinforcement Learning Infrastructure & Reliability. The role involves leading technical programs to enhance the reliability and performance of RL workloads, managing engineering initiatives, and coordinating with engineering teams.
Artificial Intelligence (AI)Business DevelopmentFoundational AIMachine Learning
Responsibilities
Drive technical programs focused on optimizing the performance and efficiency of post-training and RL workloads. This includes quantitative analysis, developing shared dashboards, and guiding engineering execution on improvements
Execute key projects from the post-training reliability roadmap, such as improving monitoring tools and centralizing core services, to enhance the stability of the entire stack
Own the technical project management for initiatives aimed at improving the long-term health, testability, and maintainability of the RL infrastructure codebases
Manage the engineering backlog and tactical execution for core RL framework development, ensuring progress is tracked and aligned with the team's strategic roadmap
Build effective working relationships with engineering teams, guiding alignment on project goals, managing interdependencies, and ensuring clear communication and risk management
Contribute to the broader program management of the Frameworks and Infrastructure team, providing clear stakeholder updates and supporting team-wide events
Qualification
Required
Bachelor's degree in a technical field or equivalent practical experience
5 years of experience in program or project management in a technical software environment
Experience working directly with engineering teams on the software development lifecycle
Preferred
5+ years of relevant work experience
Experience with machine learning workflows, particularly in training, post-training, or MLOps. Direct experience with Reinforcement Learning (RL) is a strong plus
Strong analytical skills, with experience in performance analysis, reliability engineering (SRE), or technical efficiency projects
Proficiency with project management and development tools (e.g., Jira, Gantt charts) for managing technical backlogs
Excellent interpersonal and communication skills, with a demonstrated ability to work effectively in ambiguous, fast-paced R&D environments
Benefits
Bonus
Equity
Benefits
Company
Google DeepMind
Google DeepMind aims to research and build safe artificial intelligence system to solve intelligence and advance science and humanity. It is a sub-organization of Google.
Funding
Current Stage
Late StageTotal Funding
unknown2014-01-26Acquired
2011-02-01Series A
Recent News
2026-01-12
2026-01-12
2026-01-11
Company data provided by crunchbase