Avaya · 2 days ago
Site Reliability Engineer (SRE) - Azure | DevSecOps | IaC | Governance | Observability
Avaya is an enterprise software leader that helps organizations forge unbreakable connections. They are seeking a Site Reliability Engineer (SRE) to drive stability, reliability, and performance across their Azure and GCP-based platforms, blending operational excellence with proactive incident management and collaboration with DevOps and Security teams.
Cloud ComputingElectronicsInformation ServicesInformation TechnologySmall and Medium BusinessesSoftwareTelecommunicationsWireless
Responsibilities
Serve as a key member of the 24×7 on-call rotation, responding to and managing incidents across production and pre-production environments
Lead incident bridges, coordinate root cause analysis (RCA), and ensure post-incident reviews drive systemic improvements
Maintain clear communication with cross-functional teams and leadership during major incidents
Build, tune, and maintain observability dashboards (Azure Monitor, GCP Operations Suite, Prometheus, Grafana, Datadog, Log Analytics)
Perform deep-dive troubleshooting of application and service-level issues using distributed tracing and log analysis (Grafana, Datadog) to pinpoint root causes beyond infrastructure
Define SLOs, SLIs, and error budgets to proactively identify and mitigate reliability risks before customer impact
Integrate AI-Ops tools for anomaly detection, predictive alerting, and automated incident correlation
Continuously enhance alert quality, reduce false positives, and automate runbooks for faster recovery
Analyze trends to prevent recurring issues and support teams in resilience engineering
Qualification
Required
5+ years in Site Reliability, DevOps, Cloud Operations, or Customer support roles
Demonstrated experience in application-level troubleshooting by analyzing logs and traces to identify bugs, performance bottlenecks, and error conditions
Expertise in Azure and GCP cloud operations and distributed system reliability
Understanding of Terraform, Ansible, and CI/CD pipelines (Jenkins, GitHub Actions)
Experience with observability and AI-Ops tools (Azure Monitor, GCP Operations Suite, Grafana, Prometheus, Datadog, etc.)
Solid grasp of incident management frameworks (P1–P3 handling, RCA, PIRs, on-call rotations)
Excellent analytical, troubleshooting, and communication skills
Benefits
Performance-related bonus
Benefits
Company
Avaya
Avaya provides business communications and collaboration systems, applications, and services.
Funding
Current Stage
Public CompanyTotal Funding
$700M2022-06-27Post Ipo Debt· $600M
2018-01-17IPO
2002-12-24Series Unknown· $100M
Recent News
PR Newswire UK
2025-11-07
Company data provided by crunchbase