Senior Site Reliability Engineer, Colorado Springs jobs in United States
cer-icon
Apply on Employer Site
company-logo

Onebrief · 2 months ago

Senior Site Reliability Engineer, Colorado Springs

Onebrief is a collaboration and AI-powered workflow software company designed specifically for military staffs. The role of Senior Site Reliability Engineer involves ensuring the reliability, scalability, and security of production applications, while leading incident response and automating processes to enhance operational efficiency.

Information TechnologyMilitaryProductivity ToolsSoftware
badNo H1BnoteSecurity Clearance RequirednoteU.S. Citizen Onlynote

Responsibilities

Building a World-Class Observability Platform: Design, implement, and manage our monitoring, logging, and alerting stack (e.g., Prometheus, Loki, Alloy, and Grafana). You won't just track metrics; you'll create the actionable insights and automated alerting that allow teams to identify and resolve issues before they impact users
Defining and Upholding Reliability: Define, measure, and own alerting that feeds into our Service Level Objectives (SLOs) and increases trust internally and externally. You will be the organization's expert on what it means for our systems to be reliable and how to measure it
Leading Incident Response: Act as the incident responder and potentially incident commander during critical incidents You will lead blameless post-mortems / After Action Reviews (AARs) that identify true root causes and drive automated, long-term solutions to prevent recurrence
Automating for Scale and Security: Partner with platform engineers to design, build, and manage secure, resilient Kubernetes clusters and cloud/on-prem environments using Infrastructure-as-Code (Terraform, Ansible). You will embed security and compliance controls (RMF, STIGs) directly into this automation
Eliminating Toil and Scaling the Team: Proactively identify and eliminate operational toil by building automation. You will act as a force multiplier by advising other teams on best practices in air-gapped environments and production readiness

Qualification

Site Reliability EngineeringKubernetesAWSIncident ResponseLinuxTerraformAnsibleMonitoring ToolsNetworking FundamentalsVMWareDockerHelmDoD ComplianceSecurity Minded DesignSecurity+ CredentialDocumentationCommunication

Required

3 years of experience in Site Reliability Engineering or a related field, with firsthand experience managing mission-critical systems within DoD's air-gapped environments
An active Top Secret security clearance. U.S. citizenship required
Experience automating software delivery, deployment, and providing documentation and self-service tools for engineering teams and customers
A strong understanding of Linux, containerization and orchestration, and virtual machines
Experience with centralized logging, metrics, and observability using tools such as Prometheus, Loki, Grafana, ELK stack, or Datadog
Networking fundamentals: core protocols and secure configurations
A deep understanding of incident response processes, with experience conducting thorough root cause analyses and driving continuous improvement
Clear, concise writing; strong documentation habits and async communication
Core skills and technologies: VMWare, Kubernetes, Docker, Helm, Ansible, Terraform, Linux, AWS, DoD compliance, Monitoring and Observability tools, AWS

Preferred

Experience with compliance frameworks (RMF, STIGs/SRGs, ICD 503)
Security‑minded design for air-gapped environments
Active Security+ or another DoD 8570.01-approved security credential, or the ability to obtain the valid credentials within 3 months of employment

Company

Onebrief

twittertwittertwitter
company-logo
Onebrief is a web-based military planning software for rapid decision-making and collaboration.

Funding

Current Stage
Growth Stage
Total Funding
$111.04M
Key Investors
Battery VenturesHuman Capital
2025-06-16Series C· $23.58M
2025-01-28Series C· $50M
2024-08-21Series B· $16M

Leadership Team

leader-logo
Grant Demaree
CEO and co-founder
linkedin
leader-logo
Rafa Pereira
Co-founder and CTO
linkedin
Company data provided by crunchbase