Staff Site Reliability Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Recruiting from Scratch · 8 hours ago

Staff Site Reliability Engineer

Recruiting from Scratch is a high-growth technology company building the world’s first Value Chain Management System. As a Staff Site Reliability Engineer, you will ensure the availability, performance, and scalability of production systems while embedding reliability principles into architecture and operations.

Staffing Agency
check
Growth Opportunities

Responsibilities

Champion and implement Google-style SRE principles, including SLOs and error budgets
Drive initiatives that improve system reliability, performance, and operational efficiency
Design, implement, and refine observability frameworks across infrastructure and microservices
Build dashboards, alerts, and runbooks that deepen understanding of system behavior
Automate repetitive operational tasks and reduce toil across production environments
Improve deployment pipelines, operational tooling, and incident response processes
Participate in and lead incident management activities, including blameless postmortems
Collaborate with engineering teams to influence system design for operability and cost-efficiency
Identify and resolve performance bottlenecks and architectural issues
Participate in an on-call rotation to ensure rapid and reliable response to critical alerts
Enhance reliability and observability for critical data pipelines and data infrastructure

Qualification

SRE principlesObservability stacksCloud platformsDockerKubernetesProgramming languageInfrastructure-as-CodeMicroservices architectureProblem-solving skillsCommunication skillsCollaboration skills

Required

5+ years in SRE, DevOps, or related roles focused on production reliability
Strong understanding of SRE principles: SLOs, error budgets, toil reduction, and blameless culture
Experience designing and operating observability stacks (e.g., Prometheus, Grafana, Datadog, ELK, OpenTelemetry, Jaeger)
Proficiency with at least one programming/scripting language (Python, Go, etc.)
Hands-on experience with cloud platforms (AWS, Azure, or GCP)
Expertise with Docker and Kubernetes
Experience with Infrastructure-as-Code tools (Terraform, OpenTofu, CloudFormation)
Familiarity with microservices architectures and modern CI/CD pipelines
Strong problem-solving skills with experience debugging complex distributed systems
Excellent communication and collaboration skills
Experience working with data pipelines, large-scale data infrastructure, or data streaming technologies

Preferred

Building or operating reliable large-scale data systems
Advanced automation tooling or internal platform development experience
Prior involvement in scaling infrastructure for high-growth environments
Broader exposure to reliability practices across both data and application layers

Benefits

Equity: Competitive equity grants included in most full-time offers.
Bonus: Eligible for discretionary bonus or variable compensation, depending on role.
Flexible Time Off policy
Industry-leading parental leave (14–26 weeks, fully paid based on role and situation)
Comprehensive medical, dental, and vision coverage
Employer-paid high-deductible medical plan and HSA contributions
Life, disability, and AD&D insurance
401(k) retirement savings program
Commuter benefits
Wellness benefits, including access to Calm
Pet insurance options
Employee Assistance Program
Dependent Care FSA

Company

Recruiting from Scratch

twittertwittertwitter
company-logo
A recruiting agency working with technology companies to help them hire software engineers, data roles, product managers, and hardware.

Funding

Current Stage
Early Stage

Leadership Team

leader-logo
Will Sanders
Founder / CEO
linkedin
leader-logo
Tom Callahan
Managing Partner, Retained Executive Search
linkedin
Company data provided by crunchbase