Aviatrix · 4 hours ago
SRE - Staff Engineer (P4)
Maximize your interview chances
Cloud ComputingCloud Infrastructure
H1B Sponsor Likely
Insider Connection @Aviatrix
Get 3x more responses when you reach out via email instead of LinkedIn.
Responsibilities
Ensure Reliability and Availability: You will ensure uptime for crucial services and systems based on business required SLOs. Minimize service disruptions through proactive monitoring, capacity planning and fault-tolerant design.
Architecture and System Design: you will design and architect complex, scalable and reliable systems.
Automation and Efficiency: you will develop and implement automation tools and frameworks to automate routine tasks to reduce human error and to streamline and improve operational processes to increase efficiency.
Build Observability and Monitoring tools: you will define, build, deploy, maintain, and extend our observability and monitoring tools to enhance system reliability and availability.
Incident Management and Response: you will maintain an effective on-call rotation to ensure 24/7 coverage. You will respond to incident response procedures to swiftly address and mitigate service disruptions.
Performance Monitoring and SLIs/SLOs: you will help define and monitor Service level Indicators (SLIs) and Service Level Objectives to set clear expectations for system performance.
Collaboration: you will work closely with product engineering to ensure service-level objectives and reliability targets are met
Problem-Solving & Troubleshooting: you respond to escalations by troubleshooting complex system and application incidents, perform root cause analysis, implement necessary corrective actions.
Thought Leadership and Innovation: Stay up to date with latest industry trends, emerging technologies. Iterate on best practices to increase the quality & velocity of development and deliverables.
Qualification
Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.
Required
8+ years of experience maintaining and deploying highly available, fault-tolerant systems at scale.
Proficiency in Golang or Python is required.
Infrastructure-as-code (IaC): Deep understanding of Terraform core components (e.g., Terragrunt is a bonus) with real-world experience using Terraform for infrastructure provisioning and management.
At least one cloud service provider experience (e.g., AWS, GCP, Azure, OCI)
Good knowledge with Kubernetes (e.g., cdk8s and operators are a bonus)
Solid experience developing Automation tools and frameworks.
Experience with Logging Solutions (e.g., Loki, Syslog, Elasticsearch, Logstash, Kibana, Filebeat, Fluentbit, etc.)
Experience with Monitoring and Metrics Solutions (e.g., Prometheus, Grafana, Victoria Metrics)
Practical experience with Linux system administration
Experience with Version control system (e.g., Git, GitHub) and code review
Excellent communication skills are required.
Benefits
Medical, dental and vision coverage
401(k) match
Short and long-term disability
Life/AD&D insurance
$1,000/year education reimbursement
Flexible vacation policy
Company
Aviatrix
Aviatrix® is the cloud networking expert. It is on a mission to make cloud networking simple so companies stay agile.
H1B Sponsorship
Aviatrix has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2023 (20)
2022 (42)
2021 (30)
2020 (15)
Funding
Current Stage
Late StageTotal Funding
$340.8MKey Investors
TCVGeneral CatalystCRV
2022-12-08Series Unknown
2021-09-08Series E· $200M
2021-02-23Series D· $75M
Leadership Team
Recent News
GlobeNewswire News Room
2024-12-13
2024-11-24
Company data provided by crunchbase