Lead Observability Engineer jobs in United States
info-icon
This job has closed.
company-logo

ShiftCode Analytics, Inc. ยท 1 month ago

Lead Observability Engineer

ShiftCode Analytics, Inc. is seeking a Lead Observability Engineer to design and operate end-to-end observability across hybrid private cloud and AWS environments. The role focuses on deep instrumentation, distributed tracing, and architectural observability patterns to enhance system performance and user experience.

AnalyticsConsultingInformation Technology
badNo H1Bnote

Responsibilities

Architect and implement a unified observability strategy using Dynatrace
Design and deploy distributed tracing across all Spring Boot microservices, ensuring end-to-end transaction visibility
Engineer golden signals dashboards and trace-driven diagnostics that support real-time incident response and long-term trend analysis
Lead instrumentation deep dives: JVM metrics, custom Micrometer metrics, trace attributes, log correlation, and database timing
Implement and tune anomaly detection, alerting strategies, and noise reduction techniques
Develop reference architectures and best practices for observability in hybrid cloud environments
Perform root cause analysis for latency issues, error spikes, and system degradation incidents
Mentor teams on observability tooling and ensure developers adopt instrumentation patterns by default

Qualification

Spring BootDynatraceAWSKubernetesPrometheusGrafanaContainer orchestrationRoot cause analysisObservability toolingSoft skills

Required

Extensive experience with high-scale, multi-region, and high-transaction platforms (e.g., financial systems, payment processing, or large enterprise SaaS) running in a Cloud environment
Define service-level objectives (SLOs), performance budgets, and latency/throughput targets across services
Architect and champion comprehensive distributed tracing strategies (Dynatrace, AWS X-Ray, etc.)
Analyze application, platform, and cloud behavior using deep-dive techniques such as heap dumps, thread dumps, flame graphs, GC logs, network traces, and storage I/O profiling
Review service and system architectures for performance risks (e.g., synchronous hops, excessive dependencies, misconfigured connection pools, poor cache placement)
Conduct and lead root-cause analysis for performance incidents in production and pre-production environments
Develop capacity models and performance baselines for services running across cloud environments
Experience operating large-scale and multi-region distributed systems in Cloud environments
Architect and implement a unified observability strategy using Dynatrace
Design and deploy distributed tracing across all Spring Boot microservices, ensuring end-to-end transaction visibility
Engineer golden signals dashboards and trace-driven diagnostics that support real-time incident response and long-term trend analysis
Lead instrumentation deep dives: JVM metrics, custom Micrometer metrics, trace attributes, log correlation, and database timing
Implement and tune anomaly detection, alerting strategies, and noise reduction techniques
Develop reference architectures and best practices for observability in hybrid cloud environments
Perform root cause analysis for latency issues, error spikes, and system degradation incidents
Mentor teams on observability tooling and ensure developers adopt instrumentation patterns by default

Company

ShiftCode Analytics, Inc.

twittertwitter
company-logo
ShiftCode Analytics Inc is a Tampa, FL based firm formed with one sole purpose of delivering best and quick services to its clients nationwide.

Funding

Current Stage
Growth Stage
Company data provided by crunchbase