Lead Observability/SRE Engineer with Sumo Logic experience || Remote USA jobs in United States
info-icon
This job has closed.
company-logo

TestingXperts · 10 hours ago

Lead Observability/SRE Engineer with Sumo Logic experience || Remote USA

TestingXperts is seeking a highly skilled Lead Observability Engineer to lead a critical implementation of Sumo Logic for a client migrating from Dynatrace. This role requires expertise in Sumo Logic and SRE practices, focusing on designing and implementing scalable observability solutions for AWS and Kubernetes environments.

DevOpsInformation TechnologyPenetration TestingQuality AssuranceSoftwareUsability Testing
check
H1B Sponsor Likelynote

Responsibilities

Lead the end-to-end implementation of Sumo Logic observability platform for AWS and EKS environments
Migrate monitoring and alerting assets from Dynatrace to Sumo Logic
Define and implement SLIs/SLOs, error budgets, and reliability metrics for containerized services
Deploy and configure Sumo Logic collectors across AWS and Kubernetes workloads (EKS)
Configure log, metric, and trace ingestion pipelines using OpenTelemetry and Sumo Logic apps
Design and maintain dashboards for service health, performance, and reliability insights
Implement intelligent alerting and notification workflows, using thresholds, baselines, and anomaly detection
Collaborate with DevOps, SRE, and development teams to ensure complete tracing coverage across services
Ensure best practices for alert noise reduction, escalation policies, and incident response are in place
Contribute to observability runbooks, operational handover, and training for the client SRE team

Qualification

Sumo LogicSite Reliability EngineeringAWS servicesOpenTelemetryKubernetesTerraformIncident triageStakeholder engagementDynatraceCI/CD integrationService meshCommunication

Required

Expert-level experience with Sumo Logic, including dashboarding, alerting, collector deployment, and ML features
Strong background in Site Reliability Engineering (SRE), including SLIs/SLOs, error budgets, MTTR/MTTD metrics
Proficiency in AWS services (especially CloudWatch, CloudTrail, Lambda, RDS) and EKS (Amazon Kubernetes Service)
Hands-on experience with OpenTelemetry for distributed tracing and service maps
Strong understanding of Kubernetes metrics, pod health, container resource usage, and cluster monitoring
Proven ability to define alert thresholds, configure notification routing (e.g. Slack, PagerDuty, ServiceNow), and manage alert fatigue
Strong scripting experience with tools like Terraform, Helm, YAML, and GitOps workflows
Experience with incident triage, RCA documentation, and building operational maturity in observability teams
Excellent communication and stakeholder engagement skills

Preferred

Sumo Logic certifications (Admin, Advanced Analytics) are a plus
Experience with Dynatrace (for migration purposes)
Familiarity with integrating observability into CI/CD pipelines
Exposure to service mesh (Istio/Linkerd) and monitoring microservices in that context

Company

TestingXperts

company-logo
Next Gen QA & Software Testing Company

H1B Sponsorship

TestingXperts has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (10)
2024 (1)
2020 (1)

Funding

Current Stage
Late Stage

Leadership Team

leader-logo
Manish Gupta
Founder & CEO
linkedin
leader-logo
Archana Gupta
CFO
linkedin
Company data provided by crunchbase