TAG - The Aspen Group · 4 months ago
Senior Site Reliability Engineer
The Aspen Group (TAG) is one of the largest and most trusted retail healthcare business support organizations in the U.S., dedicated to improving healthcare experiences. They are seeking a Senior Site Reliability Engineer to ensure the reliability, performance, and scalability of their systems, implementing monitoring solutions, responding to incidents, and optimizing performance to meet business objectives.
CosmeticsDentalHealth CareWellness
Responsibilities
Design, build, and maintain scalable and reliable systems to support our applications and services
Develop and manage Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to ensure systems meet reliability targets
Drive improvements in system reliability, availability, and performance through proactive measures and automation
Implement and manage comprehensive monitoring and alerting solutions to ensure full visibility into system health and performance
Develop and maintain dashboards and reporting tools that provide actionable insights for troubleshooting and performance optimization
Evaluate and integrate new monitoring tools and technologies as needed to enhance observability
Lead and participate in incident response efforts, including troubleshooting, root cause analysis, and resolution
Develop and maintain incident management processes to improve response times and minimize service disruptions
Conduct post-incident reviews to identify areas for improvement and implement preventive measures
Analyze performance metrics and logs to identify and address bottlenecks and inefficiencies in the system
Collaborate with development teams to optimize code and infrastructure for better performance and reliability
Perform capacity planning to ensure systems can handle current and future loads
Develop and implement automation solutions to streamline operations and reduce manual intervention
Identify and drive process improvements to enhance operational efficiency and effectiveness
Maintain documentation related to monitoring, incident management, and SRE best practices
Work closely with engineering, operations, and product teams to align on reliability and monitoring goals
Communicate effectively with stakeholders, providing regular updates on system health, incidents, and performance improvements
Foster a culture of collaboration and knowledge sharing within the team and across the organization
Qualification
Required
Bachelor's degree in Computer Science or a related field
At least 5 years of experience in Site Reliability Engineering or a similar role
Strong proficiency in at least one programming language such as Python, Java, or Go
Experience with containerization technologies such as Docker and Kubernetes
Strong understanding of networking, distributed systems, and cloud infrastructure
Familiarity with monitoring and logging tools such as Prometheus, Grafana, ELK Stack, and Splunk
Excellent problem-solving skills and the ability to work independently and in a team environment
Experience with incident management and root cause analysis
Benefits
Paid time off
Health
Dental
Vision
401(k) savings plan with match
Company
TAG - The Aspen Group
When we launched Aspen Dental, we set out to break down the barriers that made it hard for patients to keep up with their dental health — affordability, transparency, and access.
H1B Sponsorship
TAG - The Aspen Group has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2024 (3)
2023 (20)
2022 (16)
2021 (14)
2020 (7)
Funding
Current Stage
Late StageRecent News
Company data provided by crunchbase