KellyMitchell Group ยท 10 hours ago
Site Reliability Engineer
KellyMitchell Group is seeking a Site Reliability Engineer to join their team. The role involves designing, building, and operating reliable production systems while applying SRE principles to enhance system performance and resiliency.
Human ResourcesInformation TechnologyStaffing Agency
Responsibilities
Design, build, and operate highly reliable, scalable, and secure production systems supporting commercial platforms
Apply SRE principles to improve system resiliency, availability, performance, and operational maturity
Implement and enhance observability solutions, including monitoring, logging, tracing, and telemetry
Lead and participate in on-call rotations, independently resolving moderately to highly complex production incidents
Diagnose and remediate system, application, and infrastructure performance bottlenecks
Implement and maintain security-focused reliability solutions, including bot mitigation and threat protection
Configure, tune, and optimize web platforms, containerized services, and distributed systems
Develop automation and tooling to reduce operational toil and improve incident response efficiency
Evaluate new application and infrastructure requirements for capacity, performance, reliability, and runtime best practices
Assess new technologies and platforms for technical feasibility, alignment with standards, and operational readiness
Author, document, and teach troubleshooting methodologies, operational standards, and best practices to the SRE team
Collaborate closely with application, platform, security, and infrastructure engineers to deliver resilient solutions
Qualification
Required
Senior-level experience as a Site Reliability Engineer, Production Engineer, or Security-focused SRE
Strong programming and scripting skills in Python and/or Java (ability to build automated tooling and test coverage)
Hands-on experience with Akamai Kona Site Defender and bot mitigation strategies
Proven experience in security engineering, particularly web application protection and threat mitigation
Strong observability and monitoring experience using tools such as Splunk or similar platforms
Experience working with distributed systems and container platforms such as: Kubernetes, ECS, Fargate, and GKE
Deep understanding of Linux and Windows systems administration, including performance monitoring and troubleshooting
Expertise with networking fundamentals and protocols
Experience with CI/CD pipelines and automation tools
Strong experience with cloud platforms, AWS preferred
Proficiency with web server technologies such as: Nginx, Apache, Tomcat, Node.js including performance tuning and debugging
Experience with data platforms such as MySQL, NoSQL, Redis, Elastic, including basic configuration and troubleshooting
Exceptional troubleshooting skills with a structured, methodical approach to incident resolution
Ability to quickly understand application behavior, traffic patterns, and security threats in production environments
Strong background in observability strategy design and implementation
Experience supporting high-traffic, customer-facing digital platforms
Familiarity with large-scale enterprise or consumer environments
Experience mentoring other engineers in SRE, incident response, and reliability best practices
Benefits
Medical, Dental, & Vision Insurance Plans
Employee-Owned Profit Sharing (ESOP)
401K offered
Company
KellyMitchell Group
KellyMitchell is a HR firm for IT and technical staffing for organizations globally.
H1B Sponsorship
KellyMitchell Group has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (2)
2021 (2)
2020 (2)
Funding
Current Stage
Late StageCompany data provided by crunchbase