Apply on Employer Site

Recruiting from Scratch · 3 days ago

Staff Site Reliability Engineer

United States

Full-time

Onsite

Senior Level, Lead/Staff

$170K/yr - $220K/yr

5+ years exp

Recruiting from Scratch is a high-growth technology company building the world’s first Value Chain Management System. As a Staff Site Reliability Engineer, you will ensure the availability, performance, and scalability of production systems while embedding reliability principles into architecture and operations.

Staffing Agency

Growth Opportunities

Responsibilities

Champion and implement Google-style SRE principles, including SLOs and error budgets

Drive initiatives that improve system reliability, performance, and operational efficiency

Design, implement, and refine observability frameworks across infrastructure and microservices

Build dashboards, alerts, and runbooks that deepen understanding of system behavior

Automate repetitive operational tasks and reduce toil across production environments

Improve deployment pipelines, operational tooling, and incident response processes

Participate in and lead incident management activities, including blameless postmortems

Collaborate with engineering teams to influence system design for operability and cost-efficiency

Identify and resolve performance bottlenecks and architectural issues

Participate in an on-call rotation to ensure rapid and reliable response to critical alerts

Enhance reliability and observability for critical data pipelines and data infrastructure

Qualification

SRE principlesObservability stacksCloud platformsDockerKubernetesProgramming languageInfrastructure-as-CodeMicroservices architectureProblem-solving skillsCommunication skillsCollaboration skills

Required

5+ years in SRE, DevOps, or related roles focused on production reliability

Strong understanding of SRE principles: SLOs, error budgets, toil reduction, and blameless culture

Experience designing and operating observability stacks (e.g., Prometheus, Grafana, Datadog, ELK, OpenTelemetry, Jaeger)

Proficiency with at least one programming/scripting language (Python, Go, etc.)

Hands-on experience with cloud platforms (AWS, Azure, or GCP)

Expertise with Docker and Kubernetes

Experience with Infrastructure-as-Code tools (Terraform, OpenTofu, CloudFormation)

Familiarity with microservices architectures and modern CI/CD pipelines

Strong problem-solving skills with experience debugging complex distributed systems

Excellent communication and collaboration skills

Experience working with data pipelines, large-scale data infrastructure, or data streaming technologies

Preferred

Building or operating reliable large-scale data systems

Advanced automation tooling or internal platform development experience

Prior involvement in scaling infrastructure for high-growth environments

Broader exposure to reliability practices across both data and application layers

Benefits

Flexible Time Off policy

Industry-leading parental leave (14–26 weeks, fully paid based on role and situation)

Comprehensive medical, dental, and vision coverage

Employer-paid high-deductible medical plan and HSA contributions

Life, disability, and AD&D insurance

401(k) retirement savings program

Commuter benefits

Wellness benefits, including access to Calm

Pet insurance options

Employee Assistance Program

Dependent Care FSA

Company

Recruiting from Scratch

A recruiting agency working with technology companies to help them hire software engineers, data roles, product managers, and hardware.

Founded in 2021

Albany, New York, USA

11-50 employees

https://www.recruitingfromscratch.com/

Funding

Current Stage

Early Stage

Leadership Team

Will Sanders

Founder / CEO

Tom Callahan

Managing Partner, Retained Executive Search

Recent News

Recruiting from Scratch on LinkedIn: Apply to Staff Product Manager, AI Research at Recruiting From Scratch

2024-04-06

Company data provided by crunchbase