Overstory · 2 weeks ago
Staff Machine Learning Operations Engineer
Overstory is tackling the climate crisis by leveraging innovative technology to enhance the resilience of electrical grids. As a Staff Machine Learning Operations Engineer, you will design and build the foundations of machine learning operations, ensuring model reliability and delivering value to customers through collaboration with data engineers and scientists.
AnalyticsArtificial Intelligence (AI)Big DataMachine LearningSoftware
Responsibilities
Design, build, and maintain processes and systems such as:
Automated pipelines for training, testing, and deploying ML models
Experiment tracking systems for performance metrics, data and model versioning, and documentation
Processes and systems for the full model lifecycle, including registries, release and rollback strategies, and scalable model serving
Monitoring and alerting for prediction quality, system health, and cost optimization
Advocating for a balance between MLOps best practices and quick slices of value
Aligning technical solutions with customer needs in collaborating with both engineering and product
Ensuring our MLOps systems support regulatory, privacy, and security requirements
Qualification
Required
10+ years of experience with designing and building production-grade ML pipelines and systems – but don't filter yourself out if you feel you're a strong candidate with 5+ years
Strong knowledge of experiment tracking, model deployment strategies, data versioning, and monitoring
Experience with ML infrastructure tools (e.g. MLflow, Kubeflow, Airflow, feature stores, model registries)
Strong communication skills and ability to align technical solutions with business goals
Comfortable making architectural decisions and balancing best practices with practical trade-offs
Preferred
Familiarity with GCP and VertexAI preferred, but not required
Experience in remote-first or globally distributed teams
Background in image processing, geospatial, or spatio-temporal data processing
Prior work on real-time prediction systems or active-learning loops
Knowledge of regulatory, privacy, or security considerations in ML
Experience optimizing cloud infrastructure costs for ML workloads
Familiarity with Overstory's mission domains (e.g. satellite imagery, forestry, utilities)
Benefits
Remote working budget
Educational budget
Time to develop new skills
Equity
Competitive salary
Company
Overstory
Overstory is AI-powered grid resilience software that helps electric utilities prevent wildfires and power outages.
Funding
Current Stage
Growth StageTotal Funding
$68.07MKey Investors
Blume EquityB CapitalConvective Capital
2025-11-25Series B· $43M
2023-10-19Series A· $14M
2022-11-10Seed· $5.2M
Recent News
2025-12-03
SiliconANGLE
2025-11-28
Company data provided by crunchbase