Site Reliability Engineer (Space Communications) jobs in United States
cer-icon
Apply on Employer Site
company-logo

Northwood · 6 hours ago

Site Reliability Engineer (Space Communications)

Northwood is on a mission to transform connectivity between earth and space through innovations in space communications technologies. The Site Reliability Engineer will build monitoring and reliability systems for satellites, ensuring they operate 24/7 and supporting mission-critical operations.

AerospaceHardwareSatellite Communication
check
Diversity & Inclusion
badNo H1BnoteU.S. Citizen Onlynote

Responsibilities

Build and maintain observability stack (Grafana, Prometheus, Loki, Vector, VictoriaMetrics) that monitors ground stations, satellite communication systems, and cloud infrastructure across multiple AWS regions
Support CI/CD pipelines using GitLab and ArgoCD, partnering with development teams to ensure reliable deployments of mission-critical software
Develop and maintain AWS infrastructure using Terraform, with focus on multi-region reliability and automated scaling for ground station operations
Deploy and manage Kubernetes applications with Helm, ensuring both developer productivity and system uptime for satellite communication services
Establish monitoring strategies, alerting frameworks, and incident response procedures for infrastructure supporting real-time satellite communications
Participate in on-call rotation and lead post-incident reviews to continuously improve system reliability

Qualification

KubernetesAWSTerraformCI/CDPythonObservability toolsLinux administrationNetworking fundamentalsSRE principlesSelf-directed work style

Required

2-5 years of production infrastructure and monitoring experience with measurable reliability improvements
Strong experience with Kubernetes, Docker, and container orchestration in production environments
Hands-on experience with CI/CD tools and infrastructure as code (Terraform preferred)
AWS experience with multi-service deployments and Python programming skills for automation
Self-directed work style with ability to own projects from conception to production in fast-moving environments
Understanding of SRE principles, SLOs/SLIs, and systematic approaches to system reliability

Preferred

Experience with observability tools (Vector, Loki, Grafana, Prometheus) in production environments
Familiarity with HashiCorp Vault, Okta, or similar identity/secrets management systems
Previous experience scaling infrastructure at high-growth companies (startup to 100+ employees)
AWS certification or demonstrated expertise with advanced cloud networking and security
Linux system administration experience and networking fundamentals
Interest in aerospace, telecommunications, or mission-critical systems

Company

Northwood

twittertwittertwitter
company-logo
Northwood was founded by Bridgit Mendler, Griffin Cleverly, and Shaurya Luthra with the mission to expand access to space by transforming satellite backhaul infrastructure.

Funding

Current Stage
Early Stage
Total Funding
$36.4M
Key Investors
Harvard Innovation Labs
2025-04-22Series A· $30M
2024-02-19Seed· $6.3M
2023-02-08Grant· $0.1M

Leadership Team

leader-logo
Bridgit Mendler
CEO – Cofounder
linkedin
G
Griffin Cleverly
Co-Founder, CTO
linkedin
Company data provided by crunchbase