Site Reliability Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

PDS · 8 hours ago

Site Reliability Engineer

PDS is seeking an experienced Site Reliability Engineer with strong observability expertise to enhance transaction traceability, performance, and resiliency across a complex enterprise environment. The role focuses on building visibility into critical transaction flows and collaborating with cross-functional teams to implement observability frameworks and optimize system performance.

ComputerInformation TechnologySoftwareStaffing Agency

Responsibilities

Design and implement observability frameworks for full transaction traceability across microservices, APIs, databases, and third-party integrations
Utilize tools such as Dynatrace, OpenTelemetry, ELK, and Grafana to visualize dependencies and build actionable dashboards, alerts, and real‑time performance insights
Monitor latency, throughput, and failures to identify bottlenecks
Use telemetry and distributed tracing to troubleshoot and optimize transaction performance
Partner with application and database teams to improve system efficiency
Work with architects, engineering teams, and stakeholders to define observability standards and resiliency requirements
Establish monitoring best practices and provide training across teams
Identify and prioritize business‑critical transaction paths
Implement redundancy, failover strategies, and fault‑tolerant architectures
Support chaos engineering initiatives and resiliency testing
Define and measure SLOs and SLIs for critical transaction paths
Maintain detailed documentation of transaction flows and monitoring configurations
Produce regular reporting on system performance, resiliency, and improvement initiatives
Create incident playbooks and reusable observability frameworks
Achieve a 30% reduction in MTTD and MTTR within the first year
Identify the offending service/root cause for at least 70% of incidents within one hour
Detect 90% of issues through automated monitoring
Contribute to a culture of continuous improvement and knowledge sharing

Qualification

DynatraceAWSObservability frameworksMicroservicesScripting languagesChaos engineeringCollaborationDocumentation

Required

5+ years in SRE, Observability, or related engineering roles
Hands-on experience with Dynatrace, ELK, Datadog, Splunk, OpenTelemetry, Jaeger, or similar tools
Strong background with AWS, Azure, or GCP
Solid understanding of microservices, APIs, and distributed systems
Proficiency with scripting or programming languages (Python, Go, Java)

Preferred

Dynatrace Associate or Professional Certification
Experience with OpenTelemetry and observability standards
Familiarity with chaos engineering practices
Experience with AIOps and automation-driven monitoring

Company

PDS

twittertwitter
company-logo
PDS is one of the leading Aerospace, Information Technology (IT) & Engineering consulting firms in the Western United States.

Funding

Current Stage
Growth Stage

Leadership Team

leader-logo
Thomas Sweetman
President & Chief Executive Officer
linkedin
Company data provided by crunchbase