Staff Reliability Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

The Hartford · 6 hours ago

Staff Reliability Engineer

The Hartford is an insurance company dedicated to making a difference and is seeking a highly skilled Senior Reliability Engineer to join their Enterprise Data Organization. This role focuses on ensuring the reliability, performance, and scalability of their foundational data infrastructure and applications, while driving the transition to a modern Reliability Engineering model through automation and standardized service management.

Auto InsuranceCommercial InsuranceEmployee BenefitsFinanceFinancial ServicesInsuranceLife InsuranceProperty Insurance
badNo H1Bnote

Responsibilities

Platform Reliability & Resiliency: Design, build, and maintain highly reliable, scalable, and resilient cloud-based data platforms on AWS and GCP, including core infrastructure and services like Snowflake, EKS, OpenSearch, EMR and Hadoop ecosystems
Automation & Toil Reduction: Champion the RE mandate by identifying manual, repetitive operational tasks (toil) and developing robust automation solutions to eliminate them. This includes automating provisioning, deployment, self-healing and operational tasks
Observability & Monitoring: Implement and manage comprehensive observability solutions (monitoring, alerting, logging, tracing) for the underlying data infrastructure, applications focusing on establishing clear Service Level Indicators (SLIs), Service Level Objectives (SLOs)
Incident Response & Management: Act as an escalation point for production incidents, leading incident response, performing deep root cause analysis (RCA), designing error budgets and implementing preventative measures to ensure issues do not recur
Standardization & Documentation: Lead the standardization of operational processes and documentation, including the creation and automation of dynamic runbooks and playbooks for consistent and efficient incident resolution and service management
RE Transition: Leads as RE Subject Matter Expert and collaborate with other Platform, Product and Data Engineering Support teams to instill RE best practices, including participation in system design consulting, capacity planning, and deployment pipelines (CI/CD)

Qualification

Reliability EngineeringCloud PlatformsAutomationInfrastructure as CodeMonitoring SystemsDataOps PracticesMachine Learning PrinciplesScripting SkillsIndustry CertificationsSoft Skills

Required

10+ year's overall experience in an Infrastructure, Data or related technology organization with increasing responsibilities as a hands-on technologist
Must have 5+ year experience as an RE, Cloud, DevOps Engineer, or similar role supporting large-scale enterprise infrastructure and applications
Strong scripting and programming skills (Python etc.) for automation and tooling development
Experience with infrastructure-as-code (e.g., Terraform, CloudFormation, Ansible) and CI/CD tools
Experience designing and operating reliable and resilient infrastructure, fail-safe patterns, reliability controls, and observability from a Reliability Engineering (SRE/RE) infrastructure support perspective across cloud and big data platforms (AWS, GCP, Amazon EMR, Hadoop/Spark, OpenSearch, and container orchestration platforms etc.)
Familiarity with cloud-native integrations with databases, data integration, and business intelligence platforms (Snowflake, Informatica IDMC, Tableau, and ThoughtSpot etc.)
Expertise in setting up and tuning monitoring and alerting systems (e.g., Dynatrace, Splunk, Prometheus, Grafana, Datadog, Open Telemetry etc.)
Expertise defining and implementing of DataOps practices
Expertise implementing AIOps to monitor, manage and self-heal infrastructure, data platforms, experience implementing machine learning principles for anomaly detection, alerting and runbook automation
Experience with prompt engineering, implementing AWS or Google AI services, AI enabled automation for infrastructure reliability and performance management
Candidates must be authorized to work in the US without company sponsorship

Preferred

Relevant industry certifications preferred (AWS, GCP, Kubernetes, SRE/DevOps frameworks etc.)

Benefits

Short-term or annual bonuses
Long-term incentives
On-the-spot recognition

Company

The Hartford

company-logo
The Hartford is an industry leading provider of property and casualty insurance, group benefits and mutual funds.

Funding

Current Stage
Public Company
Total Funding
unknown
1995-12-15IPO

Leadership Team

leader-logo
Christopher Swift
Chief Executive Officer
linkedin
Company data provided by crunchbase