TestingXperts · 11 hours ago
SRE Centric Engineer | New Jersey Hybrid |
TestingXperts is seeking a SRE Centric Engineer to enhance their Site Reliability Engineering efforts. The role involves ensuring the reliability and performance of distributed systems, managing incident response, and implementing automation for self-healing systems.
Responsibilities
8+ years in Site Reliability Engineering, Production Engineering, or equivalent roles
Deep expertise in distributed systems, resilience engineering, and large-scale production operations
Strong proficiency with observability stacks: Metrics, logs, traces, Splunk, ELK, New Relic, synthetic monitoring, APM
Advanced experience with service-level objectives (SLOs), SLIs, error budgets, and reliability governance
Expertise in Kubernetes, container orchestration, and workload reliability patterns
Strong skills in incident management, on-call response, war-room leadership, and RCA methodologies
Proven ability to engineer automation/self-healing systems (auto-remediation, failure-mode detection)
Strong scripting/automation skills in Python, Bash, or similar languages
Solid understanding of traffic distribution, load balancing, session handling, and failure isolation
Expert debugging and performance troubleshooting across the full stack (network, compute, services)
Experience with AWS (EKS/ECS, SQS/SNS, S3, CloudFront, etc.)
Experience implementing AIOps, alert correlation, noise reduction, or automated RCA frameworks
Background in building paved paths, golden templates, or policy-as-code reliability gates
Experience with reverse proxy troubleshooting, including rate limits, affinity, and routing logic
Prior experience in high-throughput government or regulated environments
Performance/load testing experience (designing tests, analyzing throughput, identifying bottlenecks)
Strong understanding of release reliability, risk recording, and continuous deployment safeguards
Familiarity with monitoring-as-code or dashboards-as-code practices
Hands-on experience with infrastructure-as-code (Terraform preferred)
Qualification
Required
8+ years in Site Reliability Engineering, Production Engineering, or equivalent roles
Deep expertise in distributed systems, resilience engineering, and large‑scale production operations
Strong proficiency with observability stacks: Metrics, logs, traces, Splunk, ELK, New Relic, synthetic monitoring, APM
Advanced experience with service‑level objectives (SLOs), SLIs, error budgets, and reliability governance
Expertise in Kubernetes, container orchestration, and workload reliability patterns
Strong skills in incident management, on‑call response, war‑room leadership, and RCA methodologies
Proven ability to engineer automation/self‑healing systems (auto‑remediation, failure‑mode detection)
Strong scripting/automation skills in Python, Bash, or similar languages
Solid understanding of traffic distribution, load balancing, session handling, and failure isolation
Expert debugging and performance troubleshooting across the full stack (network, compute, services)
Experience with AWS (EKS/ECS, SQS/SNS, S3, CloudFront, etc.)
Preferred
Experience implementing AIOps, alert correlation, noise reduction, or automated RCA frameworks
Background in building paved paths, golden templates, or policy‑as‑code reliability gates
Experience with reverse proxy troubleshooting, including rate limits, affinity, and routing logic
Prior experience in high‑throughput government or regulated environments
Performance/load testing experience (designing tests, analyzing throughput, identifying bottlenecks)
Strong understanding of release reliability, risk recording, and continuous deployment safeguards
Familiarity with monitoring‑as‑code or dashboards‑as‑code practices
Hands‑on experience with infrastructure‑as‑code (Terraform preferred)
Company
TestingXperts
Next Gen QA & Software Testing Company
H1B Sponsorship
TestingXperts has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (10)
2024 (1)
2020 (1)
Funding
Current Stage
Late StageRecent News
PR Newswire
2025-09-02
Canada NewsWire
2025-08-14
Company data provided by crunchbase