Technical Program Manager, RL Infrastructure & Reliability jobs in United States
cer-icon
Apply on Employer Site
company-logo

Google DeepMind · 2 months ago

Technical Program Manager, RL Infrastructure & Reliability

DeepMind is focused on advancing artificial intelligence and is seeking a Technical Program Manager for Reinforcement Learning Infrastructure & Reliability. The role involves leading technical programs to enhance the reliability and performance of RL workloads, managing engineering initiatives, and coordinating with engineering teams.

Artificial Intelligence (AI)Business DevelopmentFoundational AIMachine Learning
check
Growth Opportunities

Responsibilities

Drive technical programs focused on optimizing the performance and efficiency of post-training and RL workloads. This includes quantitative analysis, developing shared dashboards, and guiding engineering execution on improvements
Execute key projects from the post-training reliability roadmap, such as improving monitoring tools and centralizing core services, to enhance the stability of the entire stack
Own the technical project management for initiatives aimed at improving the long-term health, testability, and maintainability of the RL infrastructure codebases
Manage the engineering backlog and tactical execution for core RL framework development, ensuring progress is tracked and aligned with the team's strategic roadmap
Build effective working relationships with engineering teams, guiding alignment on project goals, managing interdependencies, and ensuring clear communication and risk management
Contribute to the broader program management of the Frameworks and Infrastructure team, providing clear stakeholder updates and supporting team-wide events

Qualification

Reinforcement LearningPerformance OptimizationReliability EngineeringProject Management ToolsAnalytical SkillsInterpersonal SkillsCommunication Skills

Required

Bachelor's degree in a technical field or equivalent practical experience
5 years of experience in program or project management in a technical software environment
Experience working directly with engineering teams on the software development lifecycle

Preferred

5+ years of relevant work experience
Experience with machine learning workflows, particularly in training, post-training, or MLOps. Direct experience with Reinforcement Learning (RL) is a strong plus
Strong analytical skills, with experience in performance analysis, reliability engineering (SRE), or technical efficiency projects
Proficiency with project management and development tools (e.g., Jira, Gantt charts) for managing technical backlogs
Excellent interpersonal and communication skills, with a demonstrated ability to work effectively in ambiguous, fast-paced R&D environments

Benefits

Bonus
Equity
Benefits

Company

Google DeepMind

company-logo
Google DeepMind aims to research and build safe artificial intelligence system to solve intelligence and advance science and humanity. It is a sub-organization of Google.

Funding

Current Stage
Late Stage
Total Funding
unknown
2014-01-26Acquired
2011-02-01Series A

Leadership Team

leader-logo
Demis Hassabis
Co-Founder & CEO
linkedin
leader-logo
Aaron Saunders
VP of Hardware Engineering, Robotics
linkedin
Company data provided by crunchbase