LineVision · 5 hours ago
Senior Site Reliability Engineer
LineVision is a grid-enhancing technology company enabling electric utilities to deliver affordable, reliable power and accelerate the electrification of the global economy. They are seeking a Senior Site Reliability Engineer to establish their dedicated SRE practice and ensure the reliability of their grid intelligence platform. The role involves developing observability frameworks, incident response protocols, and improving deployment processes to enhance grid operations.
AnalyticsBig DataClean EnergyPower Grid
Responsibilities
Establish and maintain Service Level Objectives (SLOs) and observability frameworks for critical services supporting utility grid operations
Implement CI/CD guardrails including canary deployments, automated rollbacks, and pre-production validation to improve deployment reliability
Develop comprehensive incident response procedures with documented runbooks, escalation paths, and blameless post-incident review processes
Partner with platform, engineering, and customer support teams to instrument systems and build reliability capabilities where they deliver maximum impact
Design and implement monitoring dashboards tracking SLA compliance, reliability metrics, and error budgets
Complete comprehensive assessment of LineVision's current infrastructure, identifying critical services requiring immediate observability improvements
Establish baseline SLOs for top-priority services and implement initial monitoring dashboards in partnership with platform and support teams
Document current deployment processes and incident response procedures, identifying gaps and quick-win improvements
Deploy production-ready observability framework covering all critical customer-facing services, with alerts configured for key reliability signals
Implement CI/CD improvements including automated testing gates, canary deployments, and rollback capabilities for core platform services
Lead 3+ blameless post-incident reviews, establishing templates and processes that become standard practice across engineering
Achieve measurable improvements in deployment success rates and mean time to recovery (MTTR) through implemented SRE practices
Build strong cross-functional partnerships resulting in proactive reliability improvements identified through error budget monitoring
Establish LineVision's SRE practice as a recognized capability, with documentation, runbooks, and processes that can scale with company growth
Qualification
Required
Establish and maintain Service Level Objectives (SLOs) and observability frameworks for critical services supporting utility grid operations
Implement CI/CD guardrails including canary deployments, automated rollbacks, and pre-production validation to improve deployment reliability
Develop comprehensive incident response procedures with documented runbooks, escalation paths, and blameless post-incident review processes
Partner with platform, engineering, and customer support teams to instrument systems and build reliability capabilities where they deliver maximum impact
Design and implement monitoring dashboards tracking SLA compliance, reliability metrics, and error budgets
Complete comprehensive assessment of LineVision's current infrastructure, identifying critical services requiring immediate observability improvements
Establish baseline SLOs for top-priority services and implement initial monitoring dashboards in partnership with platform and support teams
Document current deployment processes and incident response procedures, identifying gaps and quick-win improvements
Deploy production-ready observability framework covering all critical customer-facing services, with alerts configured for key reliability signals
Implement CI/CD improvements including automated testing gates, canary deployments, and rollback capabilities for core platform services
Lead 3+ blameless post-incident reviews, establishing templates and processes that become standard practice across engineering
Achieve measurable improvements in deployment success rates and mean time to recovery (MTTR) through implemented SRE practices
Build strong cross-functional partnerships resulting in proactive reliability improvements identified through error budget monitoring
Establish LineVision's SRE practice as a recognized capability, with documentation, runbooks, and processes that can scale with company growth
Critical Thinking: Lead problem-solving efforts around complex reliability challenges, consistently applying critical thinking to identify root causes and prevent future incidents
Taking Ownership: Lead reliability projects with minimal supervision, taking full ownership of SRE practice development and system observability outcomes
Stakeholder Management: Manage relationships across engineering, platform, and support teams, providing clear updates on reliability metrics and leveraging influence to align on SRE priorities
Delivering Innovative Solutions: Lead implementation of modern SRE practices, inspiring teams to think creatively about reliability challenges in utility infrastructure context
AWS Expertise: Strong experience with core AWS services including EC2, RDS, Lambda, and networking/VPC configuration for production environments
Observability & Monitoring: Hands-on proficiency with tools like Datadog, Prometheus, Grafana, or CloudWatch for instrumenting distributed systems
Infrastructure as Code: Experience with Terraform, CloudFormation, or Pulumi for managing and versioning infrastructure
Programming: Python and TypeScript experience for automation, tooling, and system instrumentation
SLO/SLA Frameworks: Demonstrated experience establishing Service Level Objectives and tracking error budgets
Preferred
Background in energy, utility, or critical infrastructure sectors where reliability directly impacts public services
AWS certifications demonstrating deep platform expertise
Experience with security compliance frameworks (NERC CIP, ISO 27001, SOC 2) relevant to utility operations
Track record of building SRE practices from the ground up in fast-growing technical organizations
Benefits
10% bonus
Equity
Company
LineVision
LineVision equips utilities with unique monitoring and analytics that improve the capacity, resilience, and safety of the grid.
H1B Sponsorship
LineVision has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (2)
2022 (2)
Funding
Current Stage
Growth StageTotal Funding
$49.8MKey Investors
Climate Innovation Capital,S2G InvestmentsUP PartnersJoules Accelerator
2022-10-03Series C· $33M
2021-04-07Series B· $12.5M
2021-03-04Grant
Leadership Team
Recent News
2025-10-07
2025-08-13
Company data provided by crunchbase