Site Reliability Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Five9 · 2 days ago

Site Reliability Engineer

Five9 is a leading provider of cloud contact center software, bringing the power of cloud innovation to customers worldwide. They are seeking a Site Reliability Engineer (SRE) to help build and maintain highly reliable, scalable systems, focusing on automation, monitoring, and system reliability. The role combines software engineering and operations expertise to ensure services meet reliability targets while enabling rapid development and deployment.

Call CenterCustomer ServiceEnterprise SoftwareSaaSSales
check
Growth Opportunities
check
H1B Sponsor Likelynote

Responsibilities

Observability & Monitoring
Dashboards & Metrics: Design and implement comprehensive dashboards. These dashboards cover OS/platform level monitoring and application-level monitoring. These dashboards are broken into primary (RED) and secondary indicators (USE)
Availability & Reliability: Establish and maintain SLIs (Service Level Indicators), SLOs (Service Level Objectives), and error budgets for the service
Performance Monitoring: Build alerting systems and performance monitoring to proactively identify and resolve issues before they impact users
Incident Response: Participate in on-call rotations and lead incident response efforts, including post-mortem analysis and remediation. Maintain the official on-call routing. Assign and track application level problems to the engineering team
Infrastructure Automation & Deployment
CI/CD Pipeline Management: Maintain continuous integration and deployment pipelines working with our cloud and on-premise deployment teams
Infrastructure as Code: Develop and maintain infrastructure using tools like Terraform, Ansible, or similar
Configuration Management: Automate system configuration and ensure consistency across environments. Provide recommendations for and implement best practices for configuration control
Security & Compliance
Security Automation: Ensure security scanning systems are in place and review escalated vulnerabilities
Access Control: Maintain proper authentication, authorization, and audit logging systems
Compliance Reporting: Ensure systems meet regulatory requirements and industry standards
Security Incident Response: Participate in security incident response and remediation efforts
Cost Optimization
Resource Management: Monitor and optimize cloud resource usage and costs looking for planned and unplanned resource changes
Capacity Planning: Analyze usage patterns and plan for future capacity needs
Cost Analysis: Provide recommendations for cost-effective architecture and resource allocation
Right-sizing: Implement automated scaling and resource optimization strategies
Common Services & Platform Engineering:
Shared Infrastructure: Build and maintain common services like notification systems, caching layers, and message queues or third-party software stacks
Database Operations: Manage database reliability, performance, and scaling (where not handled by dedicated DB teams)
Service Mesh & Networking: Implement and maintain service discovery, load balancing, and network policies
Developer Tools: Create and maintain tools and platforms that improve developer productivity and system reliability

Qualification

Production SystemsCloud PlatformsContainerizationMonitoring & ObservabilityProgramming LanguagesInfrastructure as CodeSLI/SLO ManagementDatabase SystemsSystem AdministrationNetworkingVersion ControlError Budget PolicyToil ReductionCapacity Planning

Required

3+ years managing large-scale production environments
Comfortable with 24/7 on-call responsibilities and incident response
Strong Linux/Unix system administration skills
Understanding of TCP/IP, DNS, load balancing, and network security
Experience with SQL and NoSQL databases in production environments
Proficiency in at least two of: Python, Shell, PHP, Java, or similar languages
Experience with one of AWS, GCP, or Azure infrastructure and services
Hands-on experience with Docker, Kubernetes, and container orchestration
Experience with Prometheus, Grafana, ELK stack, or similar tools
Proficiency with Terraform, CloudFormation, or similar tools
Expert-level Git usage and collaborative development practices
Experience defining and maintaining service level objectives
Understanding of error budget concepts and implementation
Track record of identifying and eliminating repetitive manual work
Experience with performance testing and capacity management

Preferred

Bachelor's degree in Computer Science, Engineering, or equivalent experience
Experience with microservices architecture and distributed systems
Knowledge of security best practices and compliance frameworks
Experience with chaos engineering and reliability testing
Previous experience in an SRE or DevOps role at a technology company
Contributions to open-source projects or technical communities

Benefits

Health, dental, and vision coverage, beginning on the first day of employment.
Five9 covers 100% of the employee portion of the health, dental and vision coverage and shares a high portion of the dependent cost.
Short & Long-Term Disability
Basic Life Insurance
401k saving plan with employer matching
Access to an innovative mental health support platform that offers personalized care and resources in areas such as: therapy, coaching and self-guided mindfulness exercises for all covered employees and their covered dependents
Generous employee stock purchase plan
Paid Time Off
Company paid holidays
Paid volunteer hours
12 weeks paid parental leave

Company

Five9 is a cloud-based call center software company that specializes in sales, marketing, and customer service. It is a sub-organization of Five Rivers Solutions.

H1B Sponsorship

Five9 has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (13)
2024 (15)
2023 (13)
2022 (20)
2021 (15)
2020 (15)

Funding

Current Stage
Public Company
Total Funding
$861.6M
Key Investors
Oaktree Specialty LendingSapphire VenturesAdams Street Partners
2024-02-28Post Ipo Debt· $747.5M
2014-04-04IPO
2014-03-10Debt Financing· $30M

Leadership Team

leader-logo
Jonathan Rosenberg
CTO and Head of AI
linkedin
leader-logo
Jim Alexander
VP, PMO and Quality Engineering
linkedin
Company data provided by crunchbase