Relevance AI · 7 hours ago
Founding Site Reliability Engineer
Relevance AI is building the home of the AI workforce, empowering teams to delegate meaningful work to AI agents. The Founding Site Reliability Engineer will establish and scale the SRE discipline, ensuring the reliability, scalability, and security of the platform while collaborating closely with founders and engineering teams.
Agentic AIAnalyticsArtificial Intelligence (AI)Generative AIMachine LearningSoftware
Responsibilities
Own SRE establishing best practices, tooling, and culture
Tackle reliability challenges unique to multi-agent orchestration at enterprise scale
Guarantee >99.9% uptime of production systems, ensuring reliability at global scale
Architect and automate AWS infrastructure with Terraform and CI/CD pipelines
Design observability systems across microservices, APIs, and vector infrastructure (metrics, tracing, logging)
Drive down incidents and MTTR through runbooks, alerting, and incident response excellence
Help scale infra to support hundreds of thousands of agents and billions of API calls
Partner with engineering teams to embed SRE principles into the SDLC and shape org-wide reliability strategy
Act as a founding voice in our SF office, influencing product direction and engineering culture
Qualification
Required
5+ years in SRE/DevOps/Infrastructure roles, with experience in enterprise SaaS environments
Deep AWS expertise (EC2, ECS/EKS, Lambda, RDS, VPC, IAM)
Proven track record with Infrastructure as Code (Terraform, Kubernetes/EKS, CDK, or CloudFormation)
Hands-on with observability stacks (CloudWatch, Grafana, Prometheus, Datadog)
Incident management experience in production SaaS systems, including on-call, postmortems, and reliability improvements
Preferred
Bonus: Prior exposure to AI/ML platforms, data-heavy systems, or multi-agent workloads
Benefits
Health Insurance Contribution – Relevance AI contributes to the cost of individual medical, dental, and vision insurance for employees.
Commuter Benefits – Save on your commute with pre-tax deductions for transit and parking expenses
Unlimited Annual Leave – Flexible time off policy to rest, recharge, and take care of what matters most
ESOP – Employee Stock Ownership Plan so you can grow with the company
AI Productivity Benefit – Get up to $1200 USD/year to spend on AI tools, courses, and learning resources that help you work smarter and grow your skills
Parental Leave – We offer 12 weeks of paid parental leave for all eligible new parents, and an additional 6 weeks for the birthing parent
Milestone Merch – Celebrate your work anniversaries with customised Relevance AI swag
Food, Drinks & Community – Stay energised with free breakfasts, healthy snacks, and a fully stocked fridge of drinks. Enjoy team lunches provided every Thursday and Friday, plus Uber Eats dinners and regular catered office meals throughout the week. As the home of the AI workforce, we also host vibrant community events featuring thought leaders, industry partners, and the wider tech community.
Quarterly Team Events – Build stronger connections through fun, meaningful team bonding experiences every quarter
Social Clubs – Share your hobbies and interests by joining or starting a club with your teammates. From hiking and chess to board game nights and social committee activities—there’s something for everyone!
Sonder EAP – Access 24/7 mental health and wellbeing support through Sonder, our Employee Assistance Program
Company
Relevance AI
Relevance AI provides an AI agent operating system that helps companies automate repetitive reasoning tasks.
Funding
Current Stage
Growth StageTotal Funding
$42MKey Investors
Bessemer Venture PartnersKing River CapitalInsight Partners
2025-05-06Series B· $24M
2023-12-12Series A· $15M
2021-12-12Seed· $3M
Recent News
2025-10-10
2025-10-02
Company data provided by crunchbase