LoadSpring Solutions ยท 2 days ago
Site Reliability Engineer
Wonder how qualified you are to the job?
Insider Connection @LoadSpring Solutions
Responsibilities
Help design, deploy, and maintain application monitoring to ensure all applications and sites are monitored.
Respond to application, CPU, and memory alerts, find root causes of the alerts, and if possible, provide permanent solutions.
Educate users on application usage that causes performance degradation.
Help create and maintain a culture of continuous improvement within the SRE and broader organization.
QA newly deployed applications to ensure consistency and a great customer experience.
Leverage the Dev-Ops model to manage the deployment of new platform releases.
Analyze and recommend solutions for production performance and availability issues.
Create knowledgebase articles to record and document permanent fixes to customer challenges.
Collaborate with Software Implementation team to update deployment documentation with any new best practices.
Demonstrate excellent verbal and written communication skills with customers and internal teams.
Effectively complete training within the timeframe required by the business.
Maintain current knowledge of technological innovations and trends.
Take a lead role in any site outages, lead the Incident response and the postmortem process.
Develop automated recovery plans for sites and applications.
Develop automated quality assurance processes to provide a consistent environment.
Follow Change Management processes to implement configuration changes.
Follow Problem Management processes to troubleshoot and resolve recurring issues.
Participate in the on-call rotation to ensure 24 x 7 support of IT operations.
Act as a mentor within the SRE team and broader organization, providing guidance, training, and knowledge sharing.
Qualification
Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.
Required
4 to 6 years of Site Reliability Engineering/ Sys Admin or similar applicable experience.
Technically proficient in one or more scripting languages: PowerShell, Bash, Python, etc.
Technically proficient with application monitoring tools and troubleshooting techniques.
Capable of independently handling complex technical issues and problems.
Provides coaching and mentoring to subordinate team members to upskill their capabilities.
Strong Problem-Solving skills.
Analytical thinking skills.
Detail oriented.
Good time management skills.
Resource coordination and delegation.
Self-motivated with sound judgment and a bias toward action.
Bachelor's degree.
Preferably Located in the Eastern Timezone.
United States Citizenship or Legal United States Permanent Resident status.
Benefits
Health
Dental
Vision
Life
Disability
401k with a match
Company
LoadSpring Solutions
LoadSpring is the market leader in providing cloud-based project management solutions that are quick and simple to deploy and implement.