SRE Director/Observability Architect jobs in United States
info-icon
This job has closed.
company-logo

Amaze Systems · 1 day ago

SRE Director/Observability Architect

Amaze Systems is seeking a highly experienced Observability Architect cum Advisor to lead, design, and advise on enterprise-scale observability strategies for mission-critical platforms. The role requires a strong communicator and proven SRE leader with deep hands-on technical expertise and architectural leadership responsibilities.

Digital MarketingMobile AppsWeb Development

Responsibilities

Act as a trusted Observability Advisor to engineering, SRE, and platform teams across enterprise environments
Architect and implement end-to-end observability solutions across complex, mission-critical systems (e.g., Core Banking, Ecommerce, ERP, Airline Systems)
Lead and mentor SRE teams , defining best practices around reliability, SLIs/SLOs, error budgets, and incident management
Design and implement code-level instrumentation using Open Telemetry across distributed systems
Drive observability-as-code and infrastructure-as-code strategies using Terraform, Ansible, and CI/CD pipelines
Integrate observability platforms with cloud-native services, APIs, and AIOps capabilities
Enable advanced use cases such as anomaly detection, event correlation, root cause analysis, and predictive insights
Collaborate with application, platform, and cloud teams to ensure observability is embedded into system architecture from design to production
Provide executive-level and technical stakeholder communication with clear, authoritative guidance

Qualification

Site Reliability EngineeringObservability PlatformsCloud ArchitectureProgramming JavaProgramming .NETProgramming PythonOpenTelemetryInfrastructure as CodeTerraformAnsibleCommunication SkillsTeam LeadershipContinuous Learning

Required

Dynamic, learner mindset with extremely strong and authoritative communication skills
Willingness to travel 30–40%
5+ years of hands-on programming experience as a Developer/Programmer in Java, .NET, or Python (not a pure infrastructure or monitoring-only background)
Currently serving as an SRE Lead, managing a team of SREs supporting mission-critical platforms
Strong and practical understanding of Site Reliability Engineering (SRE) principles
10+ years of experience designing system architectures on one or more public clouds (AWS and/or Azure)
Deep expertise in two or more observability platforms, such as: Dynatrace, AppDynamics, Datadog, Splunk Observability, including custom integrations, APIs, AIOps features, and anomaly detection
Strong experience with open-source observability tools: Prometheus, Grafana, Elasticsearch
Proven background in OpenTelemetry, including code-level instrumentation
Strong experience implementing observability and infrastructure as code using: Terraform, Ansible, CI/CD tools (any modern platform)

Preferred

Advanced Cloud Solutions Architect certification preferred
Understanding of eBPF and low-level system observability
Experience with cloud management and orchestration platforms
Hands-on experience with or custom development of AIOps tools, such as: BigPanda, Moogsoft, ServiceNow AIOps, Splunk ITSI
Development of custom utilities for: Advanced event correlation, Visualization, Autonomous remediation and resolution
Understanding of how to build, deploy, and fine-tune AI/ML models on cloud-native platforms
Strong understanding of ServiceNow, including: ITSM and ITOM, Programmatic and API-based integrations

Company

Amaze Systems

twittertwittertwitter
company-logo
Amaze Systems is a web and digital marketing agency that offers data analytics and SEO services.

Funding

Current Stage
Late Stage
Company data provided by crunchbase