Lead Site Reliability Engineer, Observability (Remote, North America) jobs in United States
cer-icon
Apply on Employer Site
company-logo

Vivun · 1 month ago

Lead Site Reliability Engineer, Observability (Remote, North America)

Vivun delivers Ava, the AI Sales Teammate for high-velocity sales teams that helps sellers work smarter and faster. We are seeking a Lead Site Reliability Engineer to rebuild and own our observability strategy, creating frameworks and tooling for performance measurement and reliability maintenance as we scale.

Artificial Intelligence (AI)B2BCRMEnterprise SoftwareMachine LearningSalesSales AutomationSoftware
check
Growth Opportunities
check
H1B Sponsor Likelynote

Responsibilities

Own the end-to-end observability strategy for Ava, defining the standards, tools, and patterns that ensure reliable visibility across infrastructure and agentic components
Design and implement correlation models that link agent behavior, LLM interactions, and SaaS telemetry into cohesive, actionable insights
Unify observability tooling across teams, ensuring metrics, logs, and traces flow into a central platform (e.g., Observe, Datadog, or equivalent)
Collaborate with engineering and QA to embed observability best practices into development workflows, CI/CD, and quality gates
Establish enablement frameworks—documentation, dashboards, and templates—that make observability self-serve for all engineering teams
Partner with teammates to ensure observability aligns with infrastructure reliability, alerting, and incident response patterns
Contribute to performance and reliability strategy, helping define how we measure agent quality, responsiveness, and system scalability

Qualification

Observability toolingSRE experienceAgentic systemsDistributed tracingPython SDKsCollaboration skillsCommunication skills

Required

6+ years of experience in SRE, DevOps, or Observability Engineering roles, with at least 2+ years leading or designing observability initiatives
Deep knowledge of observability tooling (e.g., OpenTelemetry, Prometheus, Grafana, Datadog, Honeycomb, Observe, etc.) and distributed tracing practices
Experience with Agentic / LLM-based systems, including tools like LangChain, Celery, OpenAI APIs, or similar orchestration frameworks
Strong understanding of how to instrument, trace, and correlate AI/LLM workflows with infrastructure-level telemetry
Proven ability to define cross-team standards, influence engineering culture, and establish scalable monitoring patterns
Strong collaboration and communication skills—you enable, not dictate

Preferred

Experience building observability into hybrid SaaS + agent architectures
Background in data pipelines or analytics observability (e.g., tracing data lineage, monitoring model drift)
Familiarity with Python- or Node.js-based observability SDKs
Prior experience scaling observability in a startup or rapid-growth environment

Benefits

Full health benefits
Stock Options at a well funded, pre-IPO company on a fast growth track
Flexible work schedules and work from anywhere at a fully remote company
Unlimited PTO with two weeks designated as “quiet period” each year

Company

Vivun

twittertwittertwitter
company-logo
Vivun offers an AI Sales Agent that automates tasks for Account Executives (AEs), allowing them to focus on higher-level strategies.

H1B Sponsorship

Vivun has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2020 (1)

Funding

Current Stage
Growth Stage
Total Funding
$131M
Key Investors
Salesforce VenturesMenlo VenturesAccel
2022-05-17Series C· $75M
2021-02-10Series B· $35M
2020-10-14Series A· $18M

Leadership Team

leader-logo
Claire Bruce
Co-Founder & COO
linkedin
leader-logo
Dominique Darrow
Co-Founder & CCO
linkedin
Company data provided by crunchbase