Apply on Employer Site

Bigeye · 4 hours ago

Staff Site Reliability Engineer

United States

Full-time

Remote

Lead/Staff

$230K/yr - $240K/yr

8+ years exp

Bigeye is a company that builds trusted tools for enterprises to manage their data and AI with confidence. As a Staff Site Reliability Engineer, you will be responsible for ensuring the reliability and operability of core systems, designing infrastructure for hybrid data and AI workloads, and collaborating with various teams to maintain high standards of service reliability.

AnalyticsArtificial Intelligence (AI)Machine LearningSoftware

H1B Sponsor Likely

Responsibilities

Design and evolve a deployment system that orchestrates hybrid application deployments across Bigeye’s AWS infrastructure and customer clouds

Build and maintain CI/CD pipelines so teams can ship changes to production regularly, safely, and with fast feedback loops

Own the infrastructure foundations that let Bigeye scale - capacity planning, environment topology, and cost aware growth

Automate repetitive workflows and build self service tooling so developers can provision, deploy, and debug without blocking on the infra team

Define and track SLIs/SLOs for core services; use error budgets and clear metrics to guide reliability work

Improve on-call quality: actionable alerts, clear runbooks, safe rollback paths, faster MTTR, and fewer noisy pages

Design and implement systems and processes to measure reliability over time, and drive concrete improvements after incidents

Gather, analyze, and visualize metrics, logs, and traces to monitor system performance and uncover hidden failure modes

Participate in system design and capacity planning to make sure new features are reliable, operable, and diagnosable from day one

Work directly with customer teams (data, platform, security) to understand their environments, support hybrid deployments, and enable them to run Bigeye with confidence

Qualification

AWSCI/CD pipelinesDistributed systemsProgramming (Java/Go/Python)Linux fundamentalsObservability fundamentalsShell scriptingKubernetesCommunicationTechnical leadership

Required

8–12+ years of total industry experience, with 5–8+ years in SRE/devops/platform/infrastructure

Strong technical leadership and ownership of reliability

Experience with large-scale distributed systems

3+ years of experience as a software developer - you've spent time on 'the other side of the fence' and can read and write production code

Strong programming skills in at least one of Java, Go, or Python

Experience running services in AWS and working with service oriented architectures

Strong Linux and systems fundamentals (processes, networking, containers, monitoring, debugging)

Comfortable with shell scripting for glue and automation; use higher-level languages (Python/Go/Java) for larger systems

Hands-on experience with CI/CD pipelines and build/deploy systems

Solid understanding of observability fundamentals (metrics, logs, traces) and how to use them to debug and improve systems

Experience operating user-facing systems in fast-moving environments, including participating in on-call rotations and incident response

Strong communication and ownership; you can work across teams, explain tradeoffs clearly, and drive projects from problem to stable solution

Proven curiosity about the real customer problem, not just the immediate ticket or symptom

Preferred

Experience leading teams or initiatives, either through people management or senior IC leadership (technical direction, mentoring, cross-team collaboration)

Kubernetes experience

Experience with GCP or Azure in addition to AWS

Deep familiarity with AWS networking - e,g. gateways, route tables, NAT, etc

Experience with hybrid deployments (cloud + customer VPCs / accounts)

Experience with modern observability tools

Familiarity with data platforms (warehouses, ETL/BI, data pipelines)

Benefits

Highly competitive salary and equity opportunity

Medical, Dental and Vision to keep you healthy

Health and Wellness package

401k plan to help you save for the future

Unlimited PTO to have fun

Receive an elite technology package to make work easier

Company

Bigeye

Bigeye provides data observability solutions for monitoring and analyzing data integrity.

Founded in 2019

San Francisco, California, USA

11-50 employees

https://www.bigeye.com

H1B Sponsorship

Bigeye has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2024 (2)

2022 (1)

Funding

Current Stage

Growth Stage

Total Funding

$73.5M

Key Investors

USAACoatueSequoia Capital

2024-10-09Series Unknown· $5M

2023-12-06Series Unknown· $2.5M

2021-09-16Series B· $45M

Leadership Team

Egor Gryaznov

Co-founder and Field CTO

Kyle Kirwan

Co-founder, Chief Product Officer

Recent News

EIN Presswire

Bigeye Announces AI Guardian to Give Enterprises Control Over Agent Data Access

2025-12-10

EIN Presswire

Bigeye Expands Data Quality with Advanced Enterprise-Ready Features

2025-10-02

EIN Presswire

Bigeye Appoints Mohamed K. Alimi as Vice President of Engineering to Lead AI Trust Platform Development

2025-06-18

Company data provided by crunchbase