Russell Tobin ยท 1 month ago
Systems Reliability Engineer || Contract role || Remote
Russell Tobin is a company seeking a Systems Reliability Engineer for a contract role. The role involves contributing to SRE strategy, mentoring teams, and implementing best practices for system reliability and automation.
ConsultingHuman ResourcesLegalStaffing Agency
Responsibilities
Contribute to the SRE strategy and establish best practices for release management, automation, and system reliability
Mentor and guide SRE, Engineering, and Product teams in adopting core SRE principles such as service ownership, reducing toil, and continuous improvement
Lead initiatives across SLIs/SLOs, observability, incident management, and postmortem practices, ensuring insights and learnings are captured and acted upon
Champion SRE practices by implementing repeatable templates for logging, monitoring, and alerting frameworks
Drive observability and monitoring excellence using tools such as Grafana, AppDynamics (AppD), and Sumo Logic, ensuring proactive detection and resolution of issues
Partner with engineering to design reliable, fault-tolerant systems and reduce operational toil through automation
Implement and leverage the Ansible Automation Platform to help teams automate infrastructure provisioning, configuration management, and event-driven workflows
Enable teams to automate operational events and infrastructure changes, reducing manual intervention and improving system resilience
Exercise sound judgment to ensure operational compliance with security, privacy, audit, disaster recovery, and other company requirements
Qualification
Required
Minimum of 5 years of experience in Site Reliability Engineering, IT operations, or related fields
Bachelor's degree in computer science, engineering, or equivalent experience (2 additional years in lieu of degree)
Technical expertise in system reliability, scalability, application design, and performance
Hands-on experience with observability and monitoring tools such as Grafana, AppDynamics, and Sumo Logic
Experience with automation platforms, particularly Ansible, for infrastructure and event-driven automation
Proven ability to mentor and guide engineers in adopting SRE practices and principles
Excellent communication and collaboration skills across diverse teams and vendors
Strong judgment and problem-solving capabilities
Experience working in multi-cloud environments
Strong interpersonal, organizational, communication, and customer service skills
Must be authorized to work in the U.S
Preferred
Experience applying ITIL, SRE and IT process best practices
Experience in tracking major incidents, rollbacks, and hotfixes; leading root cause analysis (RCA) processes; and ensuring resolution and completion of action items
Experience with technical engineering in IT operations
Benefits
Comprehensive healthcare coverage (medical, dental, and vision plans)
Supplemental coverage (accident insurance, critical illness insurance and hospital indemnity)
401(k)-retirement savings
Life & disability insurance
An employee assistance program
Legal support
Auto, home insurance
Pet insurance
Employee discounts with preferred vendors
Company
Russell Tobin
Russell Tobin is a staffing and recruiting company that provides recruitment and staffing advisory services.