Senior SRE / DevOps Engineer (Atlanta) jobs in United States
cer-icon
Apply on Employer Site
company-logo

Broad Reach Partners ยท 2 hours ago

Senior SRE / DevOps Engineer (Atlanta)

Broad Reach Partners is seeking a Site Reliability Engineer to enhance the stability, performance, and reliability of their production systems. The role involves collaborating with development, DevOps, and security teams to improve observability and optimize system performance, with a strong focus on troubleshooting in Kubernetes.

Staffing & Recruiting
badNo H1BnoteU.S. Citizen Onlynote

Responsibilities

Maintain and enhance monitoring tools (New Relic, Graylog) for service health and performance metrics
Implement and maintain high-availability systems with capacity planning, performance optimization, and fault tolerance
Define and monitor Service Level Indicators, Objectives, and Agreements with teams
Deploy and manage Kubernetes workloads to AWS EKS(A) using Helm, ArgoCD
Automate operational processes to reduce manual interventions
Manage Kubernetes workloads on AWS EKS for secure and stable deployments
Participate in on-call rotation, troubleshoot production issues, and implement permanent fixes
Work with DevOps to improve CI/CD pipelines and with development teams to embed resilience and observability
Document operational runbooks, escalation procedures, and production playbooks

Qualification

KubernetesAWSSite Reliability EngineeringMonitoring toolsScripting languagesDebugging skillsTask prioritizationAWS Cloud certificationTerraformAnsible

Required

8+ years of experience as a Site Reliability Engineer, or equivalent
Experience with tools like New Relic for monitoring and Graylog for logging
3+ years of experience with Amazon Web Services (AWS) or Microsoft Azure
3+ years of experience with Kubernetes clusters - performance monitoring in Kubernetes
Proficiency with public cloud environments (AWS preferred)
Proficiency in scripting language, like Bash, Groovy, Python
Excellent debugging and troubleshooting skills
Ability to prioritize tasks efficiently and independently under minimal supervision
Troubleshooting in Kubernetes is required which will involve you having a deep understanding of pods, nodes, networking, scaling, logs, and service-to-service communication
Deep understanding of SRE best practices and a strong ability to troubleshoot complex issues

Preferred

AWS Cloud certification
Familiar with .NET applications
Knowledge in Terraform, Ansible, monitoring tools

Company

Broad Reach Partners

twitter
company-logo
At Broad Reach, we specialize in IT staffing solutions that are tailored to your business needs. We believe in the power of authentic connections.

Funding

Current Stage
Early Stage
Company data provided by crunchbase