SpaceX · 2 weeks ago
Linux Site Reliability Engineer
SpaceX is actively developing technologies to enable human life on Mars. They are seeking an experienced Linux Site Reliability Engineer to provide expertise in Kubernetes design, maintenance, scaling, and optimization to support critical business functions.
Advanced MaterialsAerospaceManufacturingNational SecuritySpace Travel
Responsibilities
Install, manage, scale and optimize Kubernetes and RKE clusters using Ansible, Terraform and adjacent technologies in production environments
Work closely with other SpaceX engineers to gather requirements, research, evaluate, design, plan, deploy, and support software platforms and related technologies running in Kubernetes within a world-class environment that meets the needs of the demanding SpaceX engineering teams. Build highly resilient, high-performance, scalable, and robust systems
Exercise a high degree of personal responsibility for the processes, systems, and tools you create and manage; all supporting the goal of making humanity an interplanetary species
Make recommendations, justify, and implement improvements using an accepted change control methodology
Work within a diverse group to design and deliver creative solutions and resolve problems in a timely and proactive manner by interacting with internal business units
Define, document and follow standards and best practices for systems design, testing, and implementation
Foster an environment of collaboration and cross-training, upskilling the team in Kubernetes expertise and ensuring peers are developed into capable engineers
Drive scripting, self-service and automation to develop solutions to reduce administrative overhead and TOIL
Participate in on-call rotation to handle urgent after-hours work when necessary
Qualification
Required
Bachelor's degree in Computer Science or a STEM discipline and 3+ years of systems engineering experience; OR 5+ years of systems engineering experience in lieu of a degree
Experience deploying and supporting Linux servers in physical and virtualized environments (e.g. VMware via automation)
Experience with the Linux shell as well as configuring and extending Linux instances (e.g. kernel modules, cgroups, pki, iptables, interfaces)
Experience supporting and scaling containerized applications in Linux environments
Experience using automation frameworks (e.g. Ansible, Terraform) to manage provisioning and post-provisioning lifecycles of infrastructure and Kubernetes installations
Preferred
Expertise in creating repeatable, reliable, scalable systems architectures, with high availability, fault tolerance, performance tuning, monitoring, and statistics/metrics collection
Expertise in source code version control tools such as Git and Subversion and collaborating on source code via Pull Requests and other Git-based workflows
Strong understanding of Linux Container Runtime
Experience implementing configuration management provisioning and workflow automation solutions via Infrastructure as Code, CI/CD and GitOps (e.g. Ansible, AWX/Tower, Vagrant, Puppet, Redfish, Jenkins, cloud-init, ArgoCD, etc)
Experience writing test automation to ensure backwards compatibility of feature and change development for automation processes and Kubernetes deployments
Experience with programming and scripting languages such as Python and Golang to develop software solutions and integrate with external systems to implement automation against RESTful API services
Experience installing, configuring and troubleshooting Kubernetes internals, CNI, CRI and CSI plugins (e.g. Docker, Cri-O, Ceph, Cilium), load balancing (e.g. MetalLB), Service Mesh (e.g. Istio) and software-defined storage (e.g. rook-ceph) in cloud or on-premise environments
Experience developing solutions using Kubernetes patterns to extend system functionality and solve custom use cases (e.g. webhooks, controllers, operators, sidecars)
Experience implementing proactive alert/monitoring workflows and dashboards for Linux systems and Kubernetes deployments using Prometheus, Grafana, InfluxDB or similar technologies
Experience with dynamic system configuration templating using Jinja, Jsonnet, YAML and Helm
Benefits
Long-term incentives, in the form of company stock, stock options, or long-term cash awards
Potential discretionary bonuses
Ability to purchase additional stock at a discount through an Employee Stock Purchase Plan
Comprehensive medical, vision, and dental coverage
401(k) retirement plan
Short and long-term disability insurance
Life insurance
Paid parental leave
Various other discounts and perks
3 weeks of paid vacation
10 or more paid holidays per year
Paid sick leave pursuant to Company policy
Company
SpaceX
SpaceX is an aviation and aerospace company that designs, manufactures, and launches rockets and spacecraft.
Funding
Current Stage
Late StageTotal Funding
$11.78BKey Investors
Korea Investment PartnersIntesa SanpaoloAndreessen Horowitz
2025-12-12Secondary Market
2025-09-10Secondary Market
2025-08-13Secondary Market· $10M
Recent News
2026-01-09
Satellite Today
2026-01-09
2026-01-09
Company data provided by crunchbase