Senior Site Reliability Engineer, Colorado Springs jobs in United States
cer-icon
Apply on Employer Site
company-logo

Onebrief · 2 months ago

Senior Site Reliability Engineer, Colorado Springs

Onebrief is a collaboration and AI-powered workflow software company designed specifically for military staffs. They are seeking a Senior Site Reliability Engineer to ensure the reliability, scalability, and security of their production applications while collaborating with various teams to improve service quality and incident response.

Information TechnologyMilitaryProductivity ToolsSoftware
badNo H1BnoteSecurity Clearance RequirednoteU.S. Citizen Onlynote

Responsibilities

Implementing a World-Class Observability Platform: Design, implement, and manage our monitoring, logging, and alerting stack (e.g., Prometheus, Loki, Alloy, and Grafana). You won't just track metrics; you'll create the actionable insights and automated alerting that allow teams to identify and resolve issues before they impact users
Defining and Upholding Reliability: Define, measure, and own alerting that feeds into our Service Level Indicators (SLIs) and Service Level Objectives (SLOs), increasing trust internally and externally. You will be the organization's expert on what it means for our systems to be reliable and how to measure it
Leading Incident Response: Act as the incident responder and potentially incident commander during critical incidents who will lead blameless post-mortems / After Action Reviews (AARs) that identify true root causes and drive automated, long-term solutions to prevent recurrence
Automating for Scale and Security: Partner with platform engineers to design, build, and manage secure, resilient Kubernetes clusters and cloud/on-prem environments using Infrastructure-as-Code (Terraform, Ansible). You will embed security and compliance controls (RMF, STIGs) directly into this automation
Eliminating Toil and Scaling the Team: Proactively identify and eliminate operational toil by building automation. You will partner with other teams to share best practices for air-gapped environments and support their readiness for production

Qualification

Site Reliability EngineeringInfrastructure as CodeKubernetesIncident ResponseAWSCI/CDScriptingObservabilityNetworking fundamentalsSecurity-minded designGitOps practicesRelevant certifications

Required

Active Top Secret clearance
5+ years in Platform, DevOps, or Site Reliability Engineering with an infrastructure and operations focus
Proven partner to DevOps/Platform and application teams; collaborates well across functions and shares context openly
A deep understanding of incident response processes, with experience conducting thorough root cause analyses and driving continuous improvement
Infrastructure as Code: Terraform (or CloudFormation), Ansible
Containers and orchestration: Kubernetes design, deployment, and operations
CI/CD: experience building and maintaining pipelines (GitLab CI/CD, Jenkins, GitHub Actions)
Scripting: proficiency with at least one of Python, Go, or Bash
Cloud: Familiarity with AWS or AWS GovCloud
Observability: Grafana stack, ELK stack, or Datadog
Networking fundamentals: core protocols and secure configurations

Preferred

Experience in DoD environments and compliance frameworks (RMF, STIGs, ICD 503)
GitOps practices and toolchains
Security‑minded design for sensitive environments
Experience designing and implementing meaningful SLIs/SLOs (including error budgets) for complex, distributed systems
Familiarity with on‑prem virtualization (VMware, Proxmox, Nutanix, Hyper-V, etc)
Service mesh exposure (Istio, Linkerd)
Relevant certifications (e.g., AWS DevOps Engineer, CKA/CKAD)
Active Security+ or another DoD 8570.01-approved security credential, or the ability to obtain the valid credentials within 3 months of employment

Benefits

Relocation assistance

Company

Onebrief

twittertwittertwitter
company-logo
Onebrief is an AI-powered platform that supports operational planning, collaboration, and decision workflows for military and command teams.

Funding

Current Stage
Late Stage
Total Funding
$311.04M
Key Investors
Battery VenturesHuman Capital
2026-01-13Series D· $200M
2025-06-16Series C· $23.58M
2025-01-28Series C· $50M

Leadership Team

leader-logo
Grant Demaree
CEO and co-founder
linkedin
leader-logo
Rafa Pereira
Co-founder and CTO
linkedin
Company data provided by crunchbase