SIGN IN
Senior Site Reliability Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

LineVision · 14 hours ago

Senior Site Reliability Engineer

LineVision is a grid-enhancing technology company enabling electric utilities to deliver affordable, reliable power and accelerate the electrification of the global economy. They are seeking a Senior Site Reliability Engineer to establish their dedicated SRE practice and ensure the reliability of their grid intelligence platform. The role involves developing observability frameworks, incident response protocols, and improving deployment processes to enhance grid operations.
AnalyticsBig DataClean EnergyPower Grid
check
H1B Sponsor Likelynote

Responsibilities

Establish and maintain Service Level Objectives (SLOs) and observability frameworks for critical services supporting utility grid operations
Implement CI/CD guardrails including canary deployments, automated rollbacks, and pre-production validation to improve deployment reliability
Develop comprehensive incident response procedures with documented runbooks, escalation paths, and blameless post-incident review processes
Partner with platform, engineering, and customer support teams to instrument systems and build reliability capabilities where they deliver maximum impact
Design and implement monitoring dashboards tracking SLA compliance, reliability metrics, and error budgets
Complete comprehensive assessment of LineVision's current infrastructure, identifying critical services requiring immediate observability improvements
Establish baseline SLOs for top-priority services and implement initial monitoring dashboards in partnership with platform and support teams
Document current deployment processes and incident response procedures, identifying gaps and quick-win improvements
Deploy production-ready observability framework covering all critical customer-facing services, with alerts configured for key reliability signals
Implement CI/CD improvements including automated testing gates, canary deployments, and rollback capabilities for core platform services
Lead 3+ blameless post-incident reviews, establishing templates and processes that become standard practice across engineering
Achieve measurable improvements in deployment success rates and mean time to recovery (MTTR) through implemented SRE practices
Build strong cross-functional partnerships resulting in proactive reliability improvements identified through error budget monitoring
Establish LineVision's SRE practice as a recognized capability, with documentation, runbooks, and processes that can scale with company growth

Qualification

AWS ExpertiseObservability & MonitoringInfrastructure as CodeSLO/SLA FrameworksProgrammingStakeholder ManagementDelivering Innovative SolutionsCritical ThinkingTaking Ownership

Required

Establish and maintain Service Level Objectives (SLOs) and observability frameworks for critical services supporting utility grid operations
Implement CI/CD guardrails including canary deployments, automated rollbacks, and pre-production validation to improve deployment reliability
Develop comprehensive incident response procedures with documented runbooks, escalation paths, and blameless post-incident review processes
Partner with platform, engineering, and customer support teams to instrument systems and build reliability capabilities where they deliver maximum impact
Design and implement monitoring dashboards tracking SLA compliance, reliability metrics, and error budgets
Complete comprehensive assessment of LineVision's current infrastructure, identifying critical services requiring immediate observability improvements
Establish baseline SLOs for top-priority services and implement initial monitoring dashboards in partnership with platform and support teams
Document current deployment processes and incident response procedures, identifying gaps and quick-win improvements
Deploy production-ready observability framework covering all critical customer-facing services, with alerts configured for key reliability signals
Implement CI/CD improvements including automated testing gates, canary deployments, and rollback capabilities for core platform services
Lead 3+ blameless post-incident reviews, establishing templates and processes that become standard practice across engineering
Achieve measurable improvements in deployment success rates and mean time to recovery (MTTR) through implemented SRE practices
Build strong cross-functional partnerships resulting in proactive reliability improvements identified through error budget monitoring
Establish LineVision's SRE practice as a recognized capability, with documentation, runbooks, and processes that can scale with company growth
Critical Thinking: Lead problem-solving efforts around complex reliability challenges, consistently applying critical thinking to identify root causes and prevent future incidents
Taking Ownership: Lead reliability projects with minimal supervision, taking full ownership of SRE practice development and system observability outcomes
Stakeholder Management: Manage relationships across engineering, platform, and support teams, providing clear updates on reliability metrics and leveraging influence to align on SRE priorities
Delivering Innovative Solutions: Lead implementation of modern SRE practices, inspiring teams to think creatively about reliability challenges in utility infrastructure context
AWS Expertise: Strong experience with core AWS services including EC2, RDS, Lambda, and networking/VPC configuration for production environments
Observability & Monitoring: Hands-on proficiency with tools like Datadog, Prometheus, Grafana, or CloudWatch for instrumenting distributed systems
Infrastructure as Code: Experience with Terraform, CloudFormation, or Pulumi for managing and versioning infrastructure
Programming: Python and TypeScript experience for automation, tooling, and system instrumentation
SLO/SLA Frameworks: Demonstrated experience establishing Service Level Objectives and tracking error budgets

Preferred

Background in energy, utility, or critical infrastructure sectors where reliability directly impacts public services
AWS certifications demonstrating deep platform expertise
Experience with security compliance frameworks (NERC CIP, ISO 27001, SOC 2) relevant to utility operations
Track record of building SRE practices from the ground up in fast-growing technical organizations

Benefits

10% bonus
Equity

Company

LineVision

twittertwittertwitter
company-logo
LineVision equips utilities with unique monitoring and analytics that improve the capacity, resilience, and safety of the grid.

H1B Sponsorship

LineVision has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (2)
2022 (2)

Funding

Current Stage
Growth Stage
Total Funding
$49.8M
Key Investors
Climate Innovation Capital,S2G InvestmentsUP PartnersJoules Accelerator
2022-10-03Series C· $33M
2021-04-07Series B· $12.5M
2021-03-04Grant

Leadership Team

leader-logo
Jon Marmillo
Co-Founder & Chief Product Officer
linkedin
leader-logo
Tim Stelzer
SVP, Chief Technology Officer
linkedin
Company data provided by crunchbase