Senior Incident Manager jobs in United States
cer-icon
Apply on Employer Site
company-logo

Databricks · 2 months ago

Senior Incident Manager

Databricks is a data and AI company that empowers data teams to tackle challenging problems through their infrastructure platform. As a Senior Incident Manager, you will lead critical production incidents, ensuring effective communication and operational resilience while collaborating with engineering teams to improve reliability.

AnalyticsArtificial Intelligence (AI)Data StorageInformation TechnologyMachine Learning
check
Growth Opportunities
check
H1B Sponsor Likelynote

Responsibilities

Lead critical incidents — coordinate multi-disciplinary response efforts across Databricks’ cloud-based services to rapidly mitigate impact and restore operations
Drive technical root cause analysis and Reliability improvements: collaborate with engineering teams to trace and document underlying causes across distributed systems, services, and data stores
Summarize key learnings, clearly communicate action items, and ensure that technical and procedural improvements are followed through
Own communications during incidents — deliver frequent, high-quality updates to internal stakeholders (executives, engineering leadership, support) and compose and publish customer-facing notifications that are accurate, timely, and empathetic
Mentor and train peers in both incident communication and technical response disciplines to raise the overall quality of Databricks’ incident response

Qualification

Incident managementSite reliability engineeringCloud infrastructureLog analysisProgramming language proficiencyIncident playbooks developmentCommunication skillsMentoring skillsTechnical writing

Required

5+ years of experience in incident management, site reliability engineering, or production operations supporting large-scale, cloud-native systems
Proven ability to lead and coordinate high-severity incidents, including identifying impact, isolating fault domains, and managing multi-team response efforts
Strong understanding of cloud infrastructure (AWS, Azure, or GCP) — including compute, networking, storage, and observability components
Deep expertise in log analysis and debugging
Familiarity with log aggregation and search tools (e.g., Datadog, Elasticsearch, Splunk, Cloud Logging, or OpenTelemetry)
Hands-on experience with observability systems — metrics, logging, and tracing frameworks (Prometheus, Grafana, OpenTelemetry, etc.)
Proficiency in at least one major programming or scripting language (Python, Go, or Bash) for automating diagnostics, data collection, or analysis
Experience developing and maintaining incident playbooks and communication templates to ensure consistent, timely updates
Excellent contextual interpretation and writing skills, as well as the ability to effectively summarize and communicate to both technical and business audiences, are required
BS, Master's or other advanced degree in Computer Science or Computer Engineering, or related Engineering field

Benefits

Annual performance bonus
Equity

Company

Databricks

company-logo
Databricks is a data and AI platform that unifies data engineering, analytics, and machine learning on a lakehouse architecture.

H1B Sponsorship

Databricks has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (385)
2024 (319)
2023 (227)
2022 (222)
2021 (166)
2020 (64)

Funding

Current Stage
Late Stage
Total Funding
$25.81B
Key Investors
Counterpoint GlobalFranklin TempletonAndreessen Horowitz
2025-12-16Series Unknown· $4B
2025-09-08Series Unknown· $1B
2025-01-13Debt Financing· $5.25B

Leadership Team

leader-logo
Ali Ghodsi
CEO and Co-founder
linkedin
leader-logo
David Conte
Chief Financial Officer
linkedin
Company data provided by crunchbase