SLAC National Accelerator Laboratory · 5 hours ago
Senior Site Reliability Engineer
SLAC National Accelerator Laboratory is part of the Vera C. Rubin Observatory, a significant astronomy facility aimed at creating a time-lapse map of the southern sky. The role involves ensuring the reliability and robustness of the Prompt Processing Framework, which detects and distributes near-real-time alerts for transient and moving objects in the night sky.
Advanced MaterialsBiotechnologyGovernmentLife Science
Responsibilities
Ensure, through both architecture and practice, the reliable operation of the near-real-time data processing pipeline and timely delivery of alerts to downstream brokers
Design and develop software that reduces operational risk and improves system resilience, scalability, and usability, including addressing failure modes, error handling, and contention in shared resources
Improve system performance and resilience by applying architectural and systems-level optimizations to increase throughput and reduce end-to-end latency
Operate DevOps-oriented continuous deployment of services using modern distributed systems tooling and development practices (e.g., Kubernetes, Helm, ArgoCD, Kafka, Redis)
Develop monitoring dashboards and alerts for the prompt processing service and work with teammates to design and implement a sustainable on-call rotation that provides coverage during the start of observing hours in Chile (typically 2-5pm Pacific Time), with limited off-hours responsibility
Define KPIs and metrics for observability and accountability of the pipeline
Participate in the collective engineering activities of the team, including performing code reviews, acting as a troubleshooting buddy, participating in design discussions, and writing documentation to effectively capture and communicate architectural and implementation choices
Collaborate with members of the Data Management team to identify opportunities to improve tools, workflows, and operational practices
Share responsibility with the broader team for the overall success of the Data Management system, beyond the Prompt Processing Framework
Qualification
Required
Bachelor's degree and eight years of relevant experience, or a combination of education and relevant experience designing and operating distributed systems at-scale in production environments
Experience working in an SRE, DevOps, or data-intensive systems role, with responsibility for building, operating, and improving robust services
Experience engaging with modern production infrastructure (e.g., containerized services, messaging systems, and databases; see above for our current tech stack), with the ability to learn and apply new tools quickly in a production environment
Familiarity with contemporary distributed service architectures, including service-to-service communication patterns, common failure modes, and system behavior under load and scale
Fluency in at least one modern programming language (Python preferred) with experience working across the boundary between software engineering and operations
Experience working with large-scale datasets or high-throughput data processing systems, and an understanding of the operational challenges that come with data volume and velocity
Ability to communicate clearly with engineers and scientists from diverse backgrounds, including explaining technical concepts, participating in design discussions, and documenting systems and decisions
Comfort working with a high degree of autonomy, taking ownership of technical decisions and execution, while being supported by an experienced team with clear priorities and goals
Company
SLAC National Accelerator Laboratory
SLAC National Accelerator Laboratory is the U.S. Department of Energy's national lab at Stanford University. It is a sub-organization of Stanford University.
H1B Sponsorship
SLAC National Accelerator Laboratory has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2024 (3)
2023 (59)
2022 (69)
2021 (68)
2020 (41)
Funding
Current Stage
Late StageTotal Funding
unknownKey Investors
US Department of EnergyU.S. Department of Homeland Security
2023-09-27Grant
2023-08-16Grant
2023-01-20Grant
Leadership Team
Recent News
2026-01-25
2025-12-27
Company data provided by crunchbase