Apply on Employer Site

UFG Insurance · 2 days ago

Site Reliability Engineer

United States

Full-time

Remote

Senior Level, Lead/Staff

$124K/yr - $163K/yr

10+ years exp

UFG Insurance is currently hiring for a Site Reliability Engineer who will be the senior-most engineer on the Production Management team, responsible for ensuring the reliability, performance, scalability, and efficiency of critical production systems and services. The role involves troubleshooting, operating, and enhancing distributed systems while providing guidance and support to technology teams across Business Enablement.

Financial ServicesInsurance

H1B Sponsor Likely

Hiring Manager

Meghan Larsen, PHR

Responsibilities

Implement tooling to monitor system health, capacity, and performance at all levels, from hardware through the VMs and all the way to the end-user interface

Work with the production management team to troubleshoot incidents, restore service, and identify root causes

Recommend architectural and implementation of changes to products delivered by development teams based on their performance in test, performance, and production environments

Support continuous improvement of ITIL processes through automation, data driven insights, and proactive problem identification

Documents and Integrate SRE practices into the ITIL framework, including incident, change, and problem management workflows

Develop automation for system provisioning, monitoring, deployment, and recovery to reduce manual effort and human error

Develop and maintain comprehensive runbooks, standard operating procedures (SOPs), and knowledge base articles for recurring operational tasks and incident response actions

Collaborate with development teams to design resilient architecture and implement best practices for reliability and observability

Enhance observability by developing and maintaining dashboards, alerts, and performance analytics

Contribute to capacity planning, performance tuning, and resilience testing to ensure system health

Develop and update problem management documentation, ensuring known errors and workarounds are captured within the ITSM system

Manage incident response and participate in on-call rotations to ensure service reliability

Define, document and track key reliability metrics (SLIs, SLOs, SLAs) and implement continuous improvement initiatives

Drive post-incident reviews (PIRs) and develop actionable insights to prevent future occurrences

Partner with security teams to ensure systems meet compliance, security, and governance standards

Evaluate and recommend new tools, technologies, and frameworks to improve operational efficiency

Monitor network systems, servers, and applications

Contribute to capacity planning, performance tuning, and resilience testing to ensure system health

Use all necessary tools to investigate performance and reliability of systems in testing environments. Provide detailed and specific guidance on ways to eliminate bottlenecks, improve resilience, and optimize speed and reliability

Provide mentorship and technical support to other members of Production Management

Qualification

Site Reliability EngineeringMonitoring toolsAutomationScriptingNetworking conceptsITIL processesSQL Server expertiseVM performance tuningCommunication skillsProblem-solving skillsCollaboration skills

Required

Bachelor's degree in information technology, Computer Science, or a related field, or equivalent experience

10+ years of experience in progressively more demanding enterprise-scale technology roles

3+ years of experience as a Site Reliability Engineer or Senior DevOps Engineer

3+ years in software development, architecture, or related engineering discipline

Advanced experience with multiple enterprise monitoring and observability tools, including Dynatrace, PRTG, DTrace, SolarWinds, and similar

Complete Windows fluency mandatory; similar strengths in LINUX and Unisys Mainframe environments helpful

Excellent problem-solving and communication skills, with the ability to collaborate across cross-functional teams

Unparalleled understanding of advanced networking concepts and complete expertise in the entire TCP/IP stack

VM (VMware and HyperV) and physical compute performance and tuning, including networking and storage performance

VM (Java, Python, Browser, and similar VM environments) threading, garbage collection, and general performance

SQL Server expertise, including troubleshooting queries, indexes, and general performance

Experience with unstructured database performance

General understanding of LLM/SLM implementations and GPU implementations

Proficiency in automation and scripting languages

Good understanding of ITIL processes (Incident, Change, Problem, and Service Level Management)

Preferred

Master's or other advanced degree preferred

Benefits

Annual incentive compensation

Medical, dental, vision & life insurance

Accident, critical Illness & short-term disability insurance

Retirement plans with employer contributions

Generous time-off program

Programs designed to support the employee well-being and financial security.

Company

UFG Insurance

The United Fire Group (UFG) companies join together to offer a range of property/casualty products.

Founded in 1946

Cedar Rapids, Iowa, USA

501-1000 employees

https://www.ufginsurance.com/

H1B Sponsorship

UFG Insurance has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (1)

2024 (1)

2023 (3)

2021 (3)