Broad Reach Partners ยท 4 hours ago
Senior SRE / DevOps Engineer (Atlanta)
Broad Reach Partners is seeking a Site Reliability Engineer to enhance the stability, performance, and reliability of their production systems. The role involves collaborating with development, DevOps, and security teams to improve observability and optimize system performance, with a strong focus on troubleshooting in Kubernetes.
Staffing & Recruiting
Responsibilities
Maintain and enhance monitoring tools (New Relic, Graylog) for service health and performance metrics
Implement and maintain high-availability systems with capacity planning, performance optimization, and fault tolerance
Define and monitor Service Level Indicators, Objectives, and Agreements with teams
Deploy and manage Kubernetes workloads to AWS EKS(A) using Helm, ArgoCD
Automate operational processes to reduce manual interventions
Manage Kubernetes workloads on AWS EKS for secure and stable deployments
Participate in on-call rotation, troubleshoot production issues, and implement permanent fixes
Work with DevOps to improve CI/CD pipelines and with development teams to embed resilience and observability
Document operational runbooks, escalation procedures, and production playbooks
Qualification
Required
8+ years of experience as a Site Reliability Engineer, or equivalent
Experience with tools like New Relic for monitoring and Graylog for logging
3+ years of experience with Amazon Web Services (AWS) or Microsoft Azure
3+ years of experience with Kubernetes clusters - performance monitoring in Kubernetes
Proficiency with public cloud environments (AWS preferred)
Proficiency in scripting language, like Bash, Groovy, Python
Excellent debugging and troubleshooting skills
Ability to prioritize tasks efficiently and independently under minimal supervision
Troubleshooting in Kubernetes is required which will involve you having a deep understanding of pods, nodes, networking, scaling, logs, and service-to-service communication
Deep understanding of SRE best practices and a strong ability to troubleshoot complex issues
Preferred
AWS Cloud certification
Familiar with .NET applications
Knowledge in Terraform, Ansible, monitoring tools
Company
Broad Reach Partners
At Broad Reach, we specialize in IT staffing solutions that are tailored to your business needs. We believe in the power of authentic connections.
Funding
Current Stage
Early StageCompany data provided by crunchbase