Senior Site Reliability Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Northwood · 2 hours ago

Senior Site Reliability Engineer

Northwood is looking for a Senior Site Reliability Engineer to architect and lead the monitoring and reliability systems that keep satellites connected to Earth. This high-impact leadership role involves designing and building observability infrastructure for space communications systems while mentoring junior engineers and establishing SRE practices across the organization.

Satellite Communication
check
Diversity & Inclusion

Responsibilities

Architect and maintain enterprise observability stack (Grafana, Prometheus, Loki, Vector, VictoriaMetrics) monitoring ground stations, satellite communications, and multi-region AWS infrastructure
Design SRE practices, error budgets, and SLO/SLI frameworks for mission-critical satellite systems with 99.9%+ uptime requirements
Build advanced AWS infrastructure with Terraform, implementing multi-region reliability, automated scaling, and disaster recovery for ground station operations
Lead CI/CD pipeline architecture using GitLab and ArgoCD with advanced deployment strategies for mission-critical software releases
Mentor junior engineers and establish reliability standards across the growing engineering organization
Design comprehensive Kubernetes deployments with Helm, focusing on high availability and zero-downtime operations
Lead incident response, conduct post-mortems, and drive systematic reliability improvements

Qualification

KubernetesAWSTerraformCI/CDSRE principlesDockerObservability toolsLinux administrationNetworking expertiseSoft skills

Required

5-8 years of production infrastructure and SRE experience with demonstrated leadership in reliability improvements and team mentorship
Expert-level experience with Kubernetes, Docker, and container orchestration in large-scale production environments
Strong background in infrastructure as code (Terraform) and advanced CI/CD practices with experience mentoring others on these technologies
Advanced AWS experience including multi-region architectures, networking, security, and cost optimization, with demonstrated ability to architect complex cloud solutions
Proven track record of leading technical projects from conception to production in fast-moving, high-growth environments
Deep understanding of SRE principles, error budgets, SLOs/SLIs, and experience implementing reliability frameworks across engineering organizations

Preferred

Production experience architecting and scaling observability tools (Vector, Loki, Grafana, Prometheus, VictoriaMetrics) in high-throughput environments
Advanced experience with HashiCorp Vault, Okta, and enterprise identity/secrets management systems including policy design and implementation
Previous experience scaling infrastructure and leading technical teams at high-growth companies (startup to 500+ employees)
AWS Professional certification or equivalent demonstrated expertise with advanced cloud networking, security, and compliance frameworks
Strong Linux system administration and networking expertise with experience troubleshooting complex distributed systems
Background in aerospace, telecommunications, defense contracting, or other mission-critical, highly regulated industries
Experience with ITAR, NIST 800-171, or other defense/aerospace compliance requirements

Company

Northwood

twittertwittertwitter
company-logo
Northwood was founded by Bridgit Mendler, Griffin Cleverly, and Shaurya Luthra with the mission to expand access to space by transforming satellite backhaul infrastructure.

Funding

Current Stage
Early Stage
Total Funding
$36.4M
Key Investors
Harvard Innovation Labs
2025-04-22Series A· $30M
2024-02-19Seed· $6.3M
2023-02-08Grant· $0.1M

Leadership Team

leader-logo
Bridgit Mendler
CEO and Co-Founder
linkedin
G
Griffin Cleverly
Co-Founder, CTO
linkedin
Company data provided by crunchbase