TestingXperts · 10 hours ago
Lead Observability/SRE Engineer with Sumo Logic experience || Remote USA
TestingXperts is seeking a highly skilled Lead Observability Engineer to lead a critical implementation of Sumo Logic for a client migrating from Dynatrace. This role requires expertise in Sumo Logic and SRE practices, focusing on designing and implementing scalable observability solutions for AWS and Kubernetes environments.
DevOpsInformation TechnologyPenetration TestingQuality AssuranceSoftwareUsability Testing
Responsibilities
Lead the end-to-end implementation of Sumo Logic observability platform for AWS and EKS environments
Migrate monitoring and alerting assets from Dynatrace to Sumo Logic
Define and implement SLIs/SLOs, error budgets, and reliability metrics for containerized services
Deploy and configure Sumo Logic collectors across AWS and Kubernetes workloads (EKS)
Configure log, metric, and trace ingestion pipelines using OpenTelemetry and Sumo Logic apps
Design and maintain dashboards for service health, performance, and reliability insights
Implement intelligent alerting and notification workflows, using thresholds, baselines, and anomaly detection
Collaborate with DevOps, SRE, and development teams to ensure complete tracing coverage across services
Ensure best practices for alert noise reduction, escalation policies, and incident response are in place
Contribute to observability runbooks, operational handover, and training for the client SRE team
Qualification
Required
Expert-level experience with Sumo Logic, including dashboarding, alerting, collector deployment, and ML features
Strong background in Site Reliability Engineering (SRE), including SLIs/SLOs, error budgets, MTTR/MTTD metrics
Proficiency in AWS services (especially CloudWatch, CloudTrail, Lambda, RDS) and EKS (Amazon Kubernetes Service)
Hands-on experience with OpenTelemetry for distributed tracing and service maps
Strong understanding of Kubernetes metrics, pod health, container resource usage, and cluster monitoring
Proven ability to define alert thresholds, configure notification routing (e.g. Slack, PagerDuty, ServiceNow), and manage alert fatigue
Strong scripting experience with tools like Terraform, Helm, YAML, and GitOps workflows
Experience with incident triage, RCA documentation, and building operational maturity in observability teams
Excellent communication and stakeholder engagement skills
Preferred
Sumo Logic certifications (Admin, Advanced Analytics) are a plus
Experience with Dynatrace (for migration purposes)
Familiarity with integrating observability into CI/CD pipelines
Exposure to service mesh (Istio/Linkerd) and monitoring microservices in that context
Company
TestingXperts
Next Gen QA & Software Testing Company
H1B Sponsorship
TestingXperts has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (10)
2024 (1)
2020 (1)
Funding
Current Stage
Late StageRecent News
PR Newswire
2025-09-02
Canada NewsWire
2025-08-14
Company data provided by crunchbase