Recruiting from Scratch · 4 hours ago
Staff Site Reliability Engineer
Recruiting from Scratch is a high-growth technology company building the world’s first Value Chain Management System. As a Staff Site Reliability Engineer, you will ensure the availability, performance, and scalability of production systems while embedding reliability principles into architecture and operations.
Staffing Agency
Responsibilities
Champion and implement Google-style SRE principles, including SLOs and error budgets
Drive initiatives that improve system reliability, performance, and operational efficiency
Design, implement, and refine observability frameworks across infrastructure and microservices
Build dashboards, alerts, and runbooks that deepen understanding of system behavior
Automate repetitive operational tasks and reduce toil across production environments
Improve deployment pipelines, operational tooling, and incident response processes
Participate in and lead incident management activities, including blameless postmortems
Collaborate with engineering teams to influence system design for operability and cost-efficiency
Identify and resolve performance bottlenecks and architectural issues
Participate in an on-call rotation to ensure rapid and reliable response to critical alerts
Enhance reliability and observability for critical data pipelines and data infrastructure
Qualification
Required
5+ years in SRE, DevOps, or related roles focused on production reliability
Strong understanding of SRE principles: SLOs, error budgets, toil reduction, and blameless culture
Experience designing and operating observability stacks (e.g., Prometheus, Grafana, Datadog, ELK, OpenTelemetry, Jaeger)
Proficiency with at least one programming/scripting language (Python, Go, etc.)
Hands-on experience with cloud platforms (AWS, Azure, or GCP)
Expertise with Docker and Kubernetes
Experience with Infrastructure-as-Code tools (Terraform, OpenTofu, CloudFormation)
Familiarity with microservices architectures and modern CI/CD pipelines
Strong problem-solving skills with experience debugging complex distributed systems
Excellent communication and collaboration skills
Experience working with data pipelines, large-scale data infrastructure, or data streaming technologies
Preferred
Building or operating reliable large-scale data systems
Advanced automation tooling or internal platform development experience
Prior involvement in scaling infrastructure for high-growth environments
Broader exposure to reliability practices across both data and application layers
Benefits
Equity: Competitive equity grants included in most full-time offers.
Bonus: Eligible for discretionary bonus or variable compensation, depending on role.
Flexible Time Off policy
Industry-leading parental leave (14–26 weeks, fully paid based on role and situation)
Comprehensive medical, dental, and vision coverage
Employer-paid high-deductible medical plan and HSA contributions
Life, disability, and AD&D insurance
401(k) retirement savings program
Commuter benefits
Wellness benefits, including access to Calm
Pet insurance options
Employee Assistance Program
Dependent Care FSA
Company
Recruiting from Scratch
A recruiting agency working with technology companies to help them hire software engineers, data roles, product managers, and hardware.
Funding
Current Stage
Early StageRecent News
Company data provided by crunchbase