Site Reliability Engineer (SRE) jobs in United States
cer-icon
Apply on Employer Site
company-logo

Air Apps · 1 day ago

Site Reliability Engineer (SRE)

Air Apps is a family-founded company on a mission to create the world’s first AI-powered Personal & Entrepreneurial Resource Planner. As a Site Reliability Engineer (SRE), you will ensure the reliability, availability, and scalability of systems by implementing automation, monitoring, and performance optimization strategies.

Mobile AppsSoftware

Responsibilities

Design and implement scalable, reliable, and fault-tolerant systems across cloud environments
Develop and maintain observability tools, including monitoring, logging, and alerting (e.g., Prometheus, Grafana, Datadog, ELK)
Automate infrastructure provisioning, deployment, and incident response using Infrastructure as Code (IaC) tools like Terraform or CloudFormation
Optimize system performance, scalability, and incident response workflows to improve uptime
Work closely with development and DevOps teams to improve system design for reliability
Conduct root cause analysis (RCA) and implement preventative measures to minimize failures
Ensure high availability by designing and maintaining load balancing, failover, and disaster recovery strategies
Improve CI/CD pipelines to enhance deployment speed while maintaining stability
Optimize cloud cost and resource utilization for AWS, Azure, or Google Cloud Platform (GCP)
Participate in on-call rotations to quickly address system failures and minimize downtime

Qualification

Site Reliability EngineeringCloud platforms AWSCloud platforms AzureCloud platforms GCPInfrastructure as Code (IaC)ContainerizationOrchestrationMonitoring toolsLinux system administrationScripting BashScripting PythonScripting GoObservabilityIncident managementCommunication skills

Required

Around 4+ years of experience in Site Reliability Engineering (SRE), DevOps, or System Engineering
Strong knowledge of cloud platforms (AWS, Azure, or GCP) and cloud-native architectures
Experience with observability and monitoring tools (Prometheus, Grafana, ELK, Datadog, New Relic)
Proficiency in Infrastructure as Code (IaC) tools such as Terraform, CloudFormation, or Pulumi
Hands-on experience with containerization and orchestration (Docker, Kubernetes, Helm)
Strong Linux system administration and networking fundamentals
Experience with incident management, debugging, and root cause analysis
Proficiency in scripting (Bash, Python, or Go) for automation and system monitoring
Knowledge of load balancing, failover strategies, and distributed systems
Understanding of security best practices, access control, and compliance requirements
Strong communication skills and the ability to collaborate with cross-functional teams

Benefits

Apple hardware ecosystem for work.
Annual Bonus.
Medical Insurance (including vision & dental).
Disability insurance - short and long-term.
401k up to 4% contribution.
Air Conference – an opportunity to meet the team, collaborate, and grow together.
Transportation budget
Free meals at the hub
Gym membership

Company

Air Apps

twittertwittertwitter
company-logo
Independent iOS mobile apps studio. Making people’s lives easier, empowering them to lead lives freely. Weightless, like Air.

Funding

Current Stage
Growth Stage
Company data provided by crunchbase