American Unit, Inc · 5 hours ago
DevOps Engineer
American Unit, Inc is seeking a Senior Site Reliability Engineer (SRE) with extensive experience in AWS infrastructure and automation. The role involves ensuring the resilience and efficiency of cloud-native systems while implementing best practices in monitoring, incident response, and collaboration with development teams.
Responsibilities
Design, implement, and maintain scalable, secure, and highly available infrastructure on AWS
Develop and improve CI/CD pipelines, Infrastructure as Code (IaC) using Terraform, Harness
Own and implement monitoring, alerting, logging, and distributed tracing with tools like Dynatrace/ Datadog
Troubleshoot production incidents, conduct blameless postmortems, and improve incident response processes
Optimize systems for cost, performance, and reliability
Drive chaos engineering and resilience testing
Collaborate with development teams to embed SRE practices like SLAs, SLOs, and error budgets
Mentor junior SREs and promote DevOps/SRE culture across the organization
Qualification
Required
Strong experience in SRE, DevOps, or Cloud Engineering
Expertise in AWS core services (EC2, ECS/EKS, Lambda, S3, VPC, RDS, IAM, CloudFront, etc.)
Hands-on experience with Terraform, Ansible, or other IaC tools
Strong scripting/coding skills (Python, Go, Shell, etc.)
Experience with Kubernetes, containerization, and orchestration
Deep knowledge of Linux systems and networking
Preferred
Experience with Service Meshes (e.g., Istio, App Mesh)
Familiarity with AWS Well-Architected Framework
Experience building self-healing systems and automated remediation
Background in security, compliance, or multi-account/multi-region AWS architectures
AWS Certified DevOps Engineer – Professional
AWS Certified Solutions Architect – Professional