AIG · 3 months ago
Service Reliability Engineer, GI Application Management
American International Group, Inc. (AIG) is a leading global insurance organization. The Site Reliability Engineer (SRE) role focuses on applying software engineering principles to IT operations, ensuring robust and scalable systems while prioritizing automation, monitoring, and incident management to enhance system reliability and user experience.
BankingFinancial ServicesInsurance
Responsibilities
Keep up continuous uptime and accessibility of critical business applications and services. This involves actively monitoring system performance, detecting potential issues, and implementing strategies to prevent downtime
Respond to and resolve incidents and outages promptly. This includes troubleshooting problems, coordinating with other teams, and restoring service quickly
Automate repetitive, manual tasks (toil) to improve efficiency and reduce human error. This might involve scripting, developing tools, and improving infrastructure management processes
Establish and maintain robust monitoring and alerting systems to gain real-time insights into system health and performance. This allows for proactive identification and detection of anomalies or potential issues
Analyze usage patterns and forecast resource needs to ensure that systems can handle expected growth and traffic spikes without performance degradation. This involves designing and implementing scalable architectures
After major incidents causing outages, conduct blameless post-mortem reviews to analyze the root causes of failures, document learnings, and implement corrective measures to prevent future occurrences
Act as a bridge between development and operations teams, working closely with developers to improve application architecture, incorporate reliability best practices into the development lifecycle, and ensure optimal delivery efficiency
Establish clear, measurable targets for system performance and reliability, often based on Service Level Indicators (SLIs). These Service Level indicators and objectives guide development and operations priorities to maintain high levels of user satisfaction
Qualification
Required
Bachelor's degree in related field and 3+ years of relevant technology experience, demonstrating progressive responsibility and leadership in overseeing regional technology teams
Solid grasp of core technical areas such as programming (Python, Go, Java are common), system administration (Linux/Unix), networking, databases, and cloud computing platforms (like AWS, Azure, GCP)
Practical experience running production systems, troubleshooting issues, and participating in on-call rotations is highly valued, building crucial intuition for real-world system failures
Proficiency in scripting languages (e.g., Python, Bash) and Infrastructure as Code (IaC) tools (e.g., Terraform, Ansible) is crucial
Must be skilled in implementing comprehensive monitoring solutions, leveraging tools like Prometheus, Grafana, or ELK Stack to track system health, detect anomalies, and set up alerts for potential issues before they impact users
Ability to quickly diagnose and resolve system incidents, minimize downtime, and implement solutions to prevent recurrence is paramount. This includes developing and adhering to incident response plans and conducting post-incident reviews (PIRs)
Ability to rely on data from metrics, logs, and other sources to understand system behavior, analyze performance, identify trends, and make informed decisions to improve system reliability
Excellent communication skills to articulate technical concepts, collaborate on projects, and foster a shared understanding of reliability goals
Proactive in learning new technologies, methodologies, and tools to adapt to changing environments and continuously improve their skills and the systems they manage
Benefits
Volunteer Time Off
Matching Grants Programs
Total Rewards Program
Company
AIG
AIG is a global insurance company providing insurance products to support clients in business and in life.
H1B Sponsorship
AIG has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (2)
2024 (2)
2023 (4)
2022 (23)
2021 (5)
2020 (13)
Funding
Current Stage
Late StageLeadership Team
Recent News
Beinsure - Insurance, Reinsurance, InsurTech Insights
2025-11-26
Business Wire
2025-11-05
Company data provided by crunchbase