Recruiting from Scratch ยท 1 month ago
Senior Software Engineer, Site Reliability Tooling
Recruiting from Scratch is a specialized talent firm dedicated to helping companies build exceptional teams. They are seeking a Senior Software Engineer focused on Site Reliability Tooling to enhance the reliability and observability of large-scale production systems through the design and development of internal tools.
Staffing Agency
Responsibilities
Champion SRE principles across engineering and promote a strong culture of service ownership and reliability
Build internal tooling from scratch to improve observability, monitoring, alerting, and operational workflows
Implement standards to monitor microservices, web apps, mobile apps, machine learning systems, databases, and Kubernetes clusters
Improve incident response processes, including on-call workflows, retrospectives, and reliability reporting
Automate toil through infrastructure tooling, scripts, and scalable platform services
Help define the long-term strategy for reliability, disaster preparedness, and operational risk mitigation
Collaborate across multiple engineering groups to deliver enterprise-wide reliability initiatives
Qualification
Required
6+ years combined experience in Software Engineering, Site Reliability Engineering, and/or DevOps
Strong proficiency in Python, Go, and/or JavaScript/TypeScript
Hands-on experience with Infrastructure-as-Code (Terraform, CDK, CloudFormation)
Proven background building internal tooling and applying strong software engineering fundamentals (architecture, testing, TDD)
Strong grounding in data structures and algorithms
Experience with on-call, incident response, and incident management workflows
Experience with modern observability tools such as Datadog, Prometheus, Grafana, CloudWatch
Experience supporting high-scale SaaS systems in microservice cloud environments
Ability to work cross-functionally to drive large engineering initiatives
Data-driven mindset focused on metrics, reliability, and continuous improvement
Preferred
Experience with service mesh technologies
Full-stack engineering capabilities
Background building tooling for observability or monitoring platforms
Experience leveraging LLMs / GenAI to improve SRE workflows (chatops, auto-remediation, alert summarization, etc.)
Benefits
Comprehensive medical, dental, and vision coverage with HSA contributions
401(k) with 100% match up to $4,500 (immediate vesting)
Employee Stock Purchase Plan
Life and disability insurance
Flexible vacation, holidays, sick leave, and safety leave
Parental, family care, and military leave
Annual wellness, technology, and ergonomic reimbursements
Team events, ERGs, volunteer groups
When onsite: catered lunches, snacks, and drinks
Quarterly team onsite sessions (travel covered)
Company
Recruiting from Scratch
A recruiting agency working with technology companies to help them hire software engineers, data roles, product managers, and hardware.
Funding
Current Stage
Early StageRecent News
Company data provided by crunchbase