Senior Software Engineer, Site Reliability Tooling jobs in United States
cer-icon
Apply on Employer Site
company-logo

Recruiting from Scratch ยท 1 month ago

Senior Software Engineer, Site Reliability Tooling

Recruiting from Scratch is a specialized talent firm dedicated to helping companies build exceptional teams. They are seeking a Senior Software Engineer focused on Site Reliability Tooling to enhance the reliability and observability of large-scale production systems through the design and development of internal tools.

Staffing Agency
check
Growth Opportunities

Responsibilities

Champion SRE principles across engineering and promote a strong culture of service ownership and reliability
Build internal tooling from scratch to improve observability, monitoring, alerting, and operational workflows
Implement standards to monitor microservices, web apps, mobile apps, machine learning systems, databases, and Kubernetes clusters
Improve incident response processes, including on-call workflows, retrospectives, and reliability reporting
Automate toil through infrastructure tooling, scripts, and scalable platform services
Help define the long-term strategy for reliability, disaster preparedness, and operational risk mitigation
Collaborate across multiple engineering groups to deliver enterprise-wide reliability initiatives

Qualification

PythonInfrastructure-as-CodeData structuresAlgorithmsObservability toolsService mesh technologiesFull-stack engineeringIncident managementData-driven mindsetCross-functional collaboration

Required

6+ years combined experience in Software Engineering, Site Reliability Engineering, and/or DevOps
Strong proficiency in Python, Go, and/or JavaScript/TypeScript
Hands-on experience with Infrastructure-as-Code (Terraform, CDK, CloudFormation)
Proven background building internal tooling and applying strong software engineering fundamentals (architecture, testing, TDD)
Strong grounding in data structures and algorithms
Experience with on-call, incident response, and incident management workflows
Experience with modern observability tools such as Datadog, Prometheus, Grafana, CloudWatch
Experience supporting high-scale SaaS systems in microservice cloud environments
Ability to work cross-functionally to drive large engineering initiatives
Data-driven mindset focused on metrics, reliability, and continuous improvement

Preferred

Experience with service mesh technologies
Full-stack engineering capabilities
Background building tooling for observability or monitoring platforms
Experience leveraging LLMs / GenAI to improve SRE workflows (chatops, auto-remediation, alert summarization, etc.)

Benefits

Comprehensive medical, dental, and vision coverage with HSA contributions
401(k) with 100% match up to $4,500 (immediate vesting)
Employee Stock Purchase Plan
Life and disability insurance
Flexible vacation, holidays, sick leave, and safety leave
Parental, family care, and military leave
Annual wellness, technology, and ergonomic reimbursements
Team events, ERGs, volunteer groups
When onsite: catered lunches, snacks, and drinks
Quarterly team onsite sessions (travel covered)

Company

Recruiting from Scratch

twittertwittertwitter
company-logo
A recruiting agency working with technology companies to help them hire software engineers, data roles, product managers, and hardware.

Funding

Current Stage
Early Stage

Leadership Team

leader-logo
Will Sanders
Founder / CEO
linkedin
leader-logo
Tom Callahan
Managing Partner, Retained Executive Search
linkedin
Company data provided by crunchbase