Amaze Systems · 1 day ago
SRE Director/Observability Architect
Amaze Systems is seeking a highly experienced Observability Architect cum Advisor to lead, design, and advise on enterprise-scale observability strategies for mission-critical platforms. The role requires a strong communicator and proven SRE leader with deep hands-on technical expertise and architectural leadership responsibilities.
Digital MarketingMobile AppsWeb Development
Responsibilities
Act as a trusted Observability Advisor to engineering, SRE, and platform teams across enterprise environments
Architect and implement end-to-end observability solutions across complex, mission-critical systems (e.g., Core Banking, Ecommerce, ERP, Airline Systems)
Lead and mentor SRE teams , defining best practices around reliability, SLIs/SLOs, error budgets, and incident management
Design and implement code-level instrumentation using Open Telemetry across distributed systems
Drive observability-as-code and infrastructure-as-code strategies using Terraform, Ansible, and CI/CD pipelines
Integrate observability platforms with cloud-native services, APIs, and AIOps capabilities
Enable advanced use cases such as anomaly detection, event correlation, root cause analysis, and predictive insights
Collaborate with application, platform, and cloud teams to ensure observability is embedded into system architecture from design to production
Provide executive-level and technical stakeholder communication with clear, authoritative guidance
Qualification
Required
Dynamic, learner mindset with extremely strong and authoritative communication skills
Willingness to travel 30–40%
5+ years of hands-on programming experience as a Developer/Programmer in Java, .NET, or Python (not a pure infrastructure or monitoring-only background)
Currently serving as an SRE Lead, managing a team of SREs supporting mission-critical platforms
Strong and practical understanding of Site Reliability Engineering (SRE) principles
10+ years of experience designing system architectures on one or more public clouds (AWS and/or Azure)
Deep expertise in two or more observability platforms, such as: Dynatrace, AppDynamics, Datadog, Splunk Observability, including custom integrations, APIs, AIOps features, and anomaly detection
Strong experience with open-source observability tools: Prometheus, Grafana, Elasticsearch
Proven background in OpenTelemetry, including code-level instrumentation
Strong experience implementing observability and infrastructure as code using: Terraform, Ansible, CI/CD tools (any modern platform)
Preferred
Advanced Cloud Solutions Architect certification preferred
Understanding of eBPF and low-level system observability
Experience with cloud management and orchestration platforms
Hands-on experience with or custom development of AIOps tools, such as: BigPanda, Moogsoft, ServiceNow AIOps, Splunk ITSI
Development of custom utilities for: Advanced event correlation, Visualization, Autonomous remediation and resolution
Understanding of how to build, deploy, and fine-tune AI/ML models on cloud-native platforms
Strong understanding of ServiceNow, including: ITSM and ITOM, Programmatic and API-based integrations
Company
Amaze Systems
Amaze Systems is a web and digital marketing agency that offers data analytics and SEO services.
Funding
Current Stage
Late StageCompany data provided by crunchbase