Production Support & Reliability Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Cyberhaven · 1 day ago

Production Support & Reliability Engineer

Cyberhaven is a company focused on revolutionizing data security through AI-enabled data lineage. The Production Support & Reliability Engineer will be responsible for acting as the primary L3 owner for on-prem Content Inspection deployments and incidents, ensuring customer issues are managed effectively and driving supportability improvements.

Artificial Intelligence (AI)Cloud SecurityCyber SecurityInformation TechnologyNetwork SecuritySecurity
check
H1B Sponsor Likelynote

Responsibilities

Act as the primary L3 owner for all support cases involving on-prem Content Inspection (CI), including deployments, upgrades, performance, correctness, and stability
Own customer issues end-to-end, from initial triage and reproduction in lab or test clusters through root-cause analysis, remediation, or escalation
Serve as the internal and external point of contact for high-severity CI incidents, joining customer bridges and coordinating closely with SRE, R&D, and Product until resolution
Lead or co-lead on-prem CI deployments and upgrades alongside Professional Services and SRE, validating prerequisites, reviewing Kubernetes and Helm configurations, and coordinating maintenance windows and rollback plans
Monitor and support on-prem CI environments, understand CI metrics and logs, and drive first-line response to capacity, health, latency, and stability issues
Act as the L3 specialist for DSPM on-prem connectors that depend on CI, validating new capabilities and supporting large-scale and design-partner deployments
Design, build, and maintain a Support-owned on-prem CI lab across AWS, GCP, and Azure to reproduce customer issues, validate fixes, and test upgrades and mitigations
Create and maintain internal runbooks and knowledge base documentation for CI deployments, upgrades, troubleshooting, and escalation best practices
Enable internal teams by training Support, Professional Services, CX, SRE, and R&D on CI workflows, escalation patterns, and customer-facing considerations
Drive supportability improvements by filing and owning engineering feedback related to diagnostics, logging, health checks, and safer upgrade and rollback behavior
Partner cross-functionally with SRE, R&D, and Product Management to improve CI reliability and scalability, align releases with customer commitments, and participate in design and readiness reviews

Qualification

LinuxKubernetesContainersNetworkingTechnical SupportCommunication SkillsProblem SolvingDocumentation

Required

5+ years experience in technical support, SRE, or production operations for a SaaS or security product
Deep, hands‑on experience with Linux, containers, and Kubernetes (EKS, GKE, AKS, or self‑managed clusters)
Experience deploying and supporting virtual appliances or on‑prem products in enterprise environments
Strong understanding of networking (load balancers, TLS, DNS, proxies, firewalls) in hybrid/on‑prem settings
Comfortable reading and interpreting logs and metrics from distributed systems; able to form and test hypotheses quickly
Able to manage high‑priority incidents calmly, communicate clearly with enterprise customers, and coordinate multiple internal teams
Curious, self‑directed, and comfortable taking ownership of ambiguous problems until they are fully resolved
Excellent written and verbal communication skills; able to turn repeated patterns into clear runbooks and training material

Company

Cyberhaven

twittertwittertwitter
company-logo
Cyberhaven is an AI-powered data security company focused on detecting and preventing data loss, insider threats, and protecting cloud data.

H1B Sponsorship

Cyberhaven has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2024 (1)

Funding

Current Stage
Late Stage
Total Funding
$236.5M
Key Investors
StepStone GroupRedpointAccomplice
2025-04-02Series D· $100M
2024-06-11Series C· $88M
2021-12-14Series B· $33M

Leadership Team

leader-logo
Cristian Zamfir
Co-founder / VP Reliability and Security at Cyberhaven
linkedin
leader-logo
George Candea
Co-founder & Chief Scientist
linkedin
Company data provided by crunchbase