Senior Site Reliability Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

BAE Systems, Inc. · 13 hours ago

Senior Site Reliability Engineer

BAE Systems, Inc. is seeking a seasoned Senior Site Reliability Engineer (SRE) to ensure the reliable deployment, operation, and continuous improvement of their digital engineering software tools across BAE Systems factories in North America. The role involves guiding cross-functional teams to maintain mission-critical microservices and improve system reliability through technical expertise and leadership.

Defense & Space
check
H1B Sponsor Likelynote

Responsibilities

Monitor, troubleshoot, and resolve production incidents, ensuring rapid root‑cause analysis and long‑term fixes
Design, build, and maintain automated deployment pipelines for the digital engineering software suite using asset/inventory management tools
Deploy, configure, and operate the observability stack (Prometheus, Grafana, Fluent Bit, Loki) to provide real‑time metrics, logs, and tracing for all services
Monitor and troubleshoot PostgreSQL database health, performance, and replication issues; implement automated alerts and remediation
Use Consul to service‑discover and health‑check gRPC microservices; ensure service mesh reliability and failover handling
Define and track SLIs/SLOs, error budgets, and reliability targets for each factory site; drive root‑cause analysis and post‑mortems for incidents
Lead incident response, on‑call rotations, and runbooks; mentor junior engineers in debugging distributed systems
Collaborate with software developers, factory operations, and external vendors to embed reliability into the software development lifecycle
Evaluate emerging tools and technologies that can improve observability, automation, or performance while staying aligned with our on‑premise strategy (no public cloud platforms)
Automate operational tasks and create self‑service tooling to reduce manual overhead

Qualification

Site Reliability EngineeringObservability toolsAutomation/orchestration toolsScripting/programming (Python)PostgreSQL database managementWindows systems expertiseNetworking skillsCommunication skillsProblem-solving skillsDocumentation abilities

Required

Bachelor's degree in Computer Science, Electrical Engineering, or related field
Minimum 4 years of experience in site reliability, DevOps, or systems engineering within a high‑volume, multi‑site manufacturing or industrial environment
Deep expertise in Windows systems, networking, and version-control workflows
Experience with observability tools: Prometheus, Grafana, Fluent Bit, Loki
Proficiency in automation/orchestration tools such as Ansible (or equivalent inventory‑management solutions)
Strong scripting/programming skills (Python or similar) for building custom monitoring and remediation logic
Excellent communication, problem‑solving, and documentation abilities; comfortable working in a fast‑paced, deadline‑driven environment

Preferred

Experience with Industry 4.0 and digital transformation initiatives in manufacturing
Prior work integrating on‑premise monitoring stacks with microservice architectures
Excellent communication, problem‑solving, and documentation abilities; comfortable working in a fast‑paced, deadline‑driven environment
Experience monitoring and maintaining PostgreSQL databases in production
Familiarity with service‑discovery and health‑checking using Consul, especially for gRPC services
Strong grasp of data collection, management, and analysis, including: Data collection and integration from various sources, Data management and storage solutions, Data analysis and visualization techniques, Data-driven decision-making and problem-solving

Benefits

Health, dental, and vision insurance
Health savings accounts
A 401(k) savings plan
Disability coverage
Life and accident insurance
Employee assistance program
Legal plan
Discounts on things like home, auto, and pet insurance
Paid time off
Paid holidays
Paid parental leave
Military leave
Bereavement leave
Any applicable federal and state sick leave
Company recognition program to receive monetary or non-monetary recognition awards

Company

BAE Systems, Inc.

company-logo
Improving the future and protecting lives is an ambitious mission, but it’s what we do. BAE Systems, Inc. is the U.S.

H1B Sponsorship

BAE Systems, Inc. has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2023 (2)
2022 (2)
2021 (4)
2020 (6)

Funding

Current Stage
Late Stage

Leadership Team

leader-logo
Tom Arseneault
President & Chief Executive Officer, BAE Systems, Inc.
linkedin
leader-logo
Don Widener, PhD
Chief Technology Officer, Intelligence Solutions
linkedin
Company data provided by crunchbase