Site Reliability Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Doghouse Recruitment ยท 4 hours ago

Site Reliability Engineer

Doghouse Recruitment is focused on building a cloud platform for high-throughput, compute-heavy workloads. They are seeking a Senior Site Reliability Engineer to own production reliability, define SLIs/SLOs, and improve latency while working in a bare-metal environment.

Staffing & Recruiting
badNo H1Bnote
Hiring Manager
Sebastiaan Rondhuis
linkedin

Responsibilities

Define SLIs/SLOs
Run error budget conversations
Ship changes that reduce incidents and improve latency (p95/p99)
Build automation to kill toil
Raise deployment safety (canary/rollback)
Turn observability into signal instead of noise

Qualification

Site Reliability EngineeringLinux systems debuggingNetworkingKubernetesTerraformDockerHelmCI/CD practicesGoPythonC++

Required

Senior-level experience in Site Reliability Engineering / Production Engineering running bare metal / on-prem / data center infrastructure (not public cloud only)
Deep hands-on expertise in Linux systems debugging and performance (CPU, memory, IO, kernel-level behaviors)
Strong understanding of networking (DNS/TCP/TLS, latency, packet loss, congestion, troubleshooting under load)
Strong Kubernetes experience beyond manifests: scheduler behavior, autoscaling edge cases, kubelet pressure/evictions, etcd/control plane
Experience with Terraform, Docker, Helm, and modern CI/CD practices
Some coding skills in Go and/or Python and/or C++

Benefits

Additional bonus and stock

Company

Doghouse Recruitment

twitter
company-logo
Recruitment for your technology teams. You don't need another agency flooding your inbox with mismatched candidates.

Funding

Current Stage
Early Stage
Company data provided by crunchbase