Senior Manager of Engineering, Production Infrastructure jobs in United States
cer-icon
Apply on Employer Site
company-logo

Klaviyo · 17 hours ago

Senior Manager of Engineering, Production Infrastructure

Klaviyo is a company that empowers creators to own their own destiny by making first-party data accessible and actionable. They are seeking a Senior Manager of Engineering for Production Infrastructure to lead teams in developing platform primitives and enhancing reliability and developer experience across the company.

AdvertisingAnalyticsE-CommerceMarketing AutomationSoftware
check
Comp. & Benefits
check
H1B Sponsor Likelynote

Responsibilities

Own and evolve platform primitives in scope (compute runtimes, service networking/ingress, observability) with clear APIs, SLOs, runbooks, and support tiers
Lead by example technically: drive design reviews, review PRs, and author reference implementations, starter repos, and Terraform/Helm modules that demonstrate the golden path
Deliver golden paths and self‑service scaffolding; reduce time‑to‑first‑service and lead time for changes
Raise the bar on reliability: incident response (blameless), alert hygiene, capacity planning, and on‑call health
Be production‑close: participate in critical incident response and postmortems; trace issues across Kubernetes, service mesh, and data paths; convert learnings into durable fixes, guardrails, and policy‑as‑code
Standardize observability end‑to‑end: expand OpenTelemetry adoption, define log/trace schemas, and make SLOs and error budgets first‑class in dashboards and alerts
Evolve our Kubernetes and networking layers: plan cluster upgrades, right‑size node/Pod configs, harden ingress/gateway policies, and advance mTLS/service identity and traffic shaping
Advance CI/CD and GitOps: ensure fast, safe deploys with progressive delivery, automatic rollbacks, and pre‑prod environments that mirror prod; enforce guardrails via policy‑as‑code
Stand up a concise scorecard (SLO coverage, incident frequency/severity, lead time, MTTR, developer platform NPS, cost‑to‑serve) and drive consistent trend improvements
Partner with Security, Data Platform, and Product to clarify ownership boundaries and enable safe, fast delivery
Improve cost‑to‑serve via quotas, right‑sizing, and showback in partnership with Finance
Transform workflows by putting AI at the center, building smarter systems and ways of working from the ground up; pilot AI‑assisted runbooks and incident summarization to shorten resolution time

Qualification

KubernetesSRE practicesTerraformObservabilityCI/CDGitOpsIncident managementCapacity planningService networkingAI fluencyDocumentation

Required

7–10+ years in infra/SRE/platform with 3–5+ years leading teams (including managers or staff/lead ICs)
Demonstrated SRE practices (SLI/SLO design, incident mgmt, capacity planning) and experience with Kubernetes/container orchestration, service networking, IaC, and modern observability
Technically credible and hands‑on: comfortable reading and discussing code (e.g., Go, Python, or Java), reviewing PRs, and writing small prototypes/tooling when it accelerates the team
Fluent with Kubernetes internals (scheduling, autoscaling, resource management) and service networking (e.g., Envoy/Istio/Linkerd, API gateways)
Operate the full observability stack (metrics, logs, traces, profiling) and instrument SLIs/SLOs using OpenTelemetry‑friendly patterns
Automate by default: Terraform (or Pulumi), Helm/Kustomize, GitOps, CI/CD; you prefer guardrails and policy‑as‑code over manual gates
You write crisp docs/diagrams and define platform contracts that hold up under scale
You drive measurable developer velocity and reliability improvements and communicate progress with clarity
You build inclusive, high‑trust teams and partner tightly across Security/Product/Finance
You've already experimented with AI in work or personal projects and are eager to deepen your fluency responsibly

Preferred

Platforms 'as a product' (DX metrics, roadmaps), event‑driven architectures, and cost‑to‑serve optimization in high‑growth SaaS
Experience contributing to platform code or tooling (e.g., base images, CLI/scaffolding, controllers/operators, admission/policy), multi‑cluster or multi‑region operations, and progressive delivery

Benefits

Participation in the company’s annual cash bonus plan
Variable compensation (OTE) for sales and customer success roles
Equity
Sign-on payments
A comprehensive range of health, welfare, and wellbeing benefits based on eligibility

Company

Klaviyo is an automation and email platform designed to help grow businesses.

H1B Sponsorship

Klaviyo has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (47)
2024 (29)
2023 (24)
2022 (27)
2021 (21)
2020 (8)

Funding

Current Stage
Public Company
Total Funding
$1.35B
Key Investors
ShopifySands Capital VenturesAccel
2025-08-13Post Ipo Secondary· $195.06M
2025-05-14Post Ipo Secondary· $372.95M
2023-09-20IPO

Leadership Team

leader-logo
Andrew Bialecki
CEO
linkedin
leader-logo
Ed Hallen
Co-Founder, Chief Strategy Officer, Board Member
linkedin
Company data provided by crunchbase