Air Apps · 1 day ago
Site Reliability Engineer (SRE)
Air Apps is a family-founded company on a mission to create the world’s first AI-powered Personal & Entrepreneurial Resource Planner. As a Site Reliability Engineer (SRE), you will ensure the reliability, availability, and scalability of systems by implementing automation, monitoring, and performance optimization strategies.
Mobile AppsSoftware
Responsibilities
Design and implement scalable, reliable, and fault-tolerant systems across cloud environments
Develop and maintain observability tools, including monitoring, logging, and alerting (e.g., Prometheus, Grafana, Datadog, ELK)
Automate infrastructure provisioning, deployment, and incident response using Infrastructure as Code (IaC) tools like Terraform or CloudFormation
Optimize system performance, scalability, and incident response workflows to improve uptime
Work closely with development and DevOps teams to improve system design for reliability
Conduct root cause analysis (RCA) and implement preventative measures to minimize failures
Ensure high availability by designing and maintaining load balancing, failover, and disaster recovery strategies
Improve CI/CD pipelines to enhance deployment speed while maintaining stability
Optimize cloud cost and resource utilization for AWS, Azure, or Google Cloud Platform (GCP)
Participate in on-call rotations to quickly address system failures and minimize downtime
Qualification
Required
Around 4+ years of experience in Site Reliability Engineering (SRE), DevOps, or System Engineering
Strong knowledge of cloud platforms (AWS, Azure, or GCP) and cloud-native architectures
Experience with observability and monitoring tools (Prometheus, Grafana, ELK, Datadog, New Relic)
Proficiency in Infrastructure as Code (IaC) tools such as Terraform, CloudFormation, or Pulumi
Hands-on experience with containerization and orchestration (Docker, Kubernetes, Helm)
Strong Linux system administration and networking fundamentals
Experience with incident management, debugging, and root cause analysis
Proficiency in scripting (Bash, Python, or Go) for automation and system monitoring
Knowledge of load balancing, failover strategies, and distributed systems
Understanding of security best practices, access control, and compliance requirements
Strong communication skills and the ability to collaborate with cross-functional teams
Benefits
Apple hardware ecosystem for work.
Annual Bonus.
Medical Insurance (including vision & dental).
Disability insurance - short and long-term.
401k up to 4% contribution.
Air Conference – an opportunity to meet the team, collaborate, and grow together.
Transportation budget
Free meals at the hub
Gym membership
Company
Air Apps
Independent iOS mobile apps studio. Making people’s lives easier, empowering them to lead lives freely. Weightless, like Air.
Funding
Current Stage
Growth StageRecent News
International Business Times
2025-06-18
Company data provided by crunchbase