PRO IT USA ยท 3 days ago
Site Reliability Engineer
Wonder how qualified you are to the job?
Big DataCloud Computing
Insider Connection @PRO IT USA
Responsibilities
Collaborate with cross-functional teams to design, implement, and maintain highly available and scalable production systems.
Monitor system performance, identify bottlenecks, and proactively take action to prevent downtime and ensure optimal user experience.
Implement automation for provisioning, deployment, and configuration management to increase efficiency and reduce manual intervention.
Participate in incident response and post-incident analysis, driving continuous improvement in system reliability and recovery processes.
Conduct capacity planning and performance testing to ensure our systems can handle anticipated growth and unexpected traffic spikes.
Troubleshoot complex technical issues across the entire technology stack, from application code to infrastructure.
Drive the adoption of best practices in software development, system architecture, and infrastructure management.
Collaborate with development teams to improve application reliability, performance, and observability through code reviews and guidance.
Contribute to the on-call rotation and actively engage in identifying and addressing root causes of incidents.
Stay up to date with industry trends, emerging technologies, and SRE best practices, and bring fresh ideas to the team.
Qualification
Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.
Required
Proficiency in at least one programming language Powershell , C# etc.
Strong experience with cloud platforms (Azure) and containerization technologies (Docker, Kubernetes).
Solid understanding of networking concepts, protocols, and security principles.
Experience with configuration management tools and infrastructure-as-code practices.
Familiarity with Azure monitoring and observability tools
Ability to analyze complex systems and troubleshoot issues systematically.
Excellent communication skills and ability to work collaboratively in a team-oriented environment.
Preferred
Prior experience with incident response, on-call rotations, and incident management is a plus.
Relevant certifications such as DevOps Engineer, Azure Administration, or equivalent certifications are a plus.
Azure: 1 year (Preferred)
AWS: 1 year (Preferred)
Kubernetes: 1 year (Preferred)
Benefits
401(k)
Dental insurance
Health insurance
Company
PRO IT USA
PRO IT USA is a solutions firm that builds and manages expert teams in technology.
Funding
Current Stage
Early StageCompany data provided by crunchbase