Staff Machine Learning Operations Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Overstory · 2 weeks ago

Staff Machine Learning Operations Engineer

Overstory is tackling the climate crisis by leveraging innovative technology to enhance the resilience of electrical grids. As a Staff Machine Learning Operations Engineer, you will design and build the foundations of machine learning operations, ensuring model reliability and delivering value to customers through collaboration with data engineers and scientists.

AnalyticsArtificial Intelligence (AI)Big DataMachine LearningSoftware

Responsibilities

Design, build, and maintain processes and systems such as:
Automated pipelines for training, testing, and deploying ML models
Experiment tracking systems for performance metrics, data and model versioning, and documentation
Processes and systems for the full model lifecycle, including registries, release and rollback strategies, and scalable model serving
Monitoring and alerting for prediction quality, system health, and cost optimization
Advocating for a balance between MLOps best practices and quick slices of value
Aligning technical solutions with customer needs in collaborating with both engineering and product
Ensuring our MLOps systems support regulatory, privacy, and security requirements

Qualification

Machine Learning OperationsProduction-grade ML pipelinesExperiment tracking systemsML infrastructure toolsGCPVertexAIArchitectural decision makingBackground in image processingRegulatory considerationsOptimizing cloud infrastructure costsCommunication skillsCollaborationAdaptability

Required

10+ years of experience with designing and building production-grade ML pipelines and systems – but don't filter yourself out if you feel you're a strong candidate with 5+ years
Strong knowledge of experiment tracking, model deployment strategies, data versioning, and monitoring
Experience with ML infrastructure tools (e.g. MLflow, Kubeflow, Airflow, feature stores, model registries)
Strong communication skills and ability to align technical solutions with business goals
Comfortable making architectural decisions and balancing best practices with practical trade-offs

Preferred

Familiarity with GCP and VertexAI preferred, but not required
Experience in remote-first or globally distributed teams
Background in image processing, geospatial, or spatio-temporal data processing
Prior work on real-time prediction systems or active-learning loops
Knowledge of regulatory, privacy, or security considerations in ML
Experience optimizing cloud infrastructure costs for ML workloads
Familiarity with Overstory's mission domains (e.g. satellite imagery, forestry, utilities)

Benefits

Remote working budget
Educational budget
Time to develop new skills
Equity
Competitive salary

Company

Overstory

twittertwitter
company-logo
Overstory is AI-powered grid resilience software that helps electric utilities prevent wildfires and power outages.

Funding

Current Stage
Growth Stage
Total Funding
$68.07M
Key Investors
Blume EquityB CapitalConvective Capital
2025-11-25Series B· $43M
2023-10-19Series A· $14M
2022-11-10Seed· $5.2M

Leadership Team

leader-logo
Indra den Bakker
CEO & Co-founder
linkedin
Company data provided by crunchbase