Staff Reliability Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

The Hartford · 16 hours ago

Staff Reliability Engineer

The Hartford is an insurance company that is committed to making a difference and shaping the future. The Staff Reliability Engineer plays a critical role in maintaining the stability, performance, and scalability of systems and services, implementing best practices in reliability engineering, and mentoring team members.

Auto InsuranceCommercial InsuranceEmployee BenefitsFinanceFinancial ServicesInsuranceLife InsuranceProperty Insurance
badNo H1Bnote

Responsibilities

Lead the design, implementation, and optimization of reliable systems and infrastructure
Collaborate with software engineering, operations, and product teams to ensure uptime and availability targets are met
Develop and maintain monitoring, alerting, and incident response strategies to detect and resolve issues quickly
Conduct root cause analysis of system failures and drive corrective actions to prevent recurrence
Advocate for reliability best practices and foster a culture of proactive risk mitigation across the organization
Mentor and provide technical guidance to other reliability engineers and cross-functional team members
Develop automation tools to enhance efficiency in deployment, monitoring, and recovery processes
Participate in capacity planning, performance testing, and disaster recovery exercises
Stay current with industry trends, emerging technologies, and best practices in reliability engineering

Qualification

Reliability engineeringCloud platformsContainer orchestrationProgramming skillsMonitoring toolsIncident managementInfrastructure as codeAnalytical skillsTroubleshooting skillsSecurity best practicesCompliance knowledgeHigh-availability architecturesDistributed systemsCertificationsCommunication skillsProject leadership

Required

5+ years of experience in reliability engineering, site reliability engineering (SRE), or related roles
Expertise in cloud platforms (e.g., AWS, Azure, Google Cloud) and container orchestration (e.g., Kubernetes)
Strong programming skills in one or more languages (e.g., Python, Java)
Proven experience with logging and monitoring tools (e.g., Splunk, Dynatrace, Datadog) and incident management frameworks (e.g. ServiceNow)
Excellent analytical, troubleshooting, and communication skills
Ability to lead complex projects and influence stakeholders at all levels

Preferred

Experience with infrastructure as code (e.g., Terraform, CloudFormation)
Knowledge of security best practices and compliance requirements
Background in high-availability architectures and distributed systems
Certifications in cloud or reliability engineering domains are a plus

Benefits

Short-term or annual bonuses
Long-term incentives
On-the-spot recognition

Company

The Hartford

company-logo
The Hartford is an industry leading provider of property and casualty insurance, group benefits and mutual funds.

Funding

Current Stage
Public Company
Total Funding
unknown
1995-12-15IPO

Leadership Team

leader-logo
Christopher Swift
Chief Executive Officer
linkedin
Company data provided by crunchbase