Apply on Employer Site

Leidos · 11 hours ago

Site Reliability Engineer

United States

Full-time

Remote

Mid, Senior Level

$87K/yr - $157K/yr

4+ years exp

Leidos is a leading company focused on providing innovative solutions in various domains. They are seeking a Site Reliability Engineer to ensure the reliability, performance, and scalability of complex distributed systems, while developing automated testing frameworks and maintaining system performance.

ComputerGovernmentInformation ServicesInformation TechnologyNational SecuritySoftware

No H1B

Security Clearance Required

U.S. Citizen Only

Responsibilities

Work alongside the development and operations teams to ensure speedy and reliable software deployments, monitor systems, and improve overall reliability of the platform. In addition, as you discover and document system bugs, you have the motivation to go off and fix them yourself

Develop features utilize the AI coding tool and repository of scripts to automate, scale, test, and secure the cloud infrastructure and the pipelines

Enhance performance monitoring of the various systems via Splunk or other dashboard reporting tools

Identify performance bottlenecks and optimize the performance of cloud infrastructure

Contribute to continuing our SRE journey by suggesting ways to improve engineering build, maintenance, automation and reliability across the platform with SRE/DevOps tools and Infrastructure-as-Code

Develop and code high-quality pipeline automation workflows to support inside and outside the cloud platform that are appropriate for business and technology strategies

Develop and execute test strategies that simulate real-world failure scenarios, including network disruptions, hardware failures, and system overloads

Create, script, and run performance tests to measure system behavior under varying levels of load and traffic. Identify bottlenecks, performance degradation, and areas for optimization

Design, implement, and maintain automated test suites for infrastructure and application components. Ensure that testing is integrated into the CI/CD pipeline to validate system reliability with every release

Build automated systems for continuous performance testing, stress testing, and load testing

Work closely with SREs, developers, and operations teams to define reliability goals and develop appropriate testing strategies to validate those goals

Ensure that new services and features undergo thorough testing for performance, reliability, and failure recovery before deployment to production

Validate that monitoring, logging, and alerting mechanisms are functioning correctly by testing systems under failure conditions

Ensure that Service Level Indicators (SLIs) and Service Level Objectives (SLOs) are accurately measured and tracked through automated testing frameworks

Resolve most conflicts between timeline, budget, and scope independently but intuitively raise sophisticated or consequential issues to senior management

Qualification

Site Reliability EngineeringCI/CD toolsetsContainerization (Docker)Cloud infrastructure AWSCloud infrastructure AzureInfrastructure as CodeAutomated testing frameworksScripting PythonScripting BashPerformance monitoring (Splunk)Agile methodologiesITILv4 certificationAnsibleJiraDevSecOps conceptsCollaboration skills

Required

Requires BS degree and 4-8 years of prior relevant experience or Master's with 2-6 years of prior relevant experience

Currently possess and ability to maintain an active DoD Secret security clearance

Minimum of DoD 8570.01 IAT Level II Certification required prior to onboarding and must maintain certification while supporting the SMIT Contract

Must be able to support program execution in classified environments and access SIPRNet from an NMCI location on short notice (local travel)

Experience with automated script design, coding, debugging, and maintenance skills (using bash, python, etc.) preferred

Experience in CI/CD toolsets (e.g. Jenkins, GitLab, etc.)

Experience with Containerization (Docker) and Container Orchestration (Kubernetes)

Experience in application administration, configuration, and integration

Familiarity with agile development methodologies

Skilled and disciplined to work with a distributed team

Ability to work in a highly collaborative, forward thinking, and innovation-driven environment

Knowledge of Agile and DevSecOps/SRE concepts and best practices, with a desire to grow that knowledge

Hand-on experience with Atlassian products (Jira, Confluence, Bitbucket, etc.)

Experience creating JIRA and/or Azure DevOps workflows, projects, custom configurations

Experience administrating/maintaining SRE platform via Ansible playbooks (e.g. upgrading Jenkins)

Experience in automating tasks with scripting languages like PowerShell, or Python

Integrating/maintaining with various 3rd party CI/CD tools like Jenkins and Gitlab

Experience with commercial cloud infrastructure deployment environments such as AWS and Azure

Experience with automated provisioning and configuration tools like Terraform, Cloud Formation, Chef, Puppet, Ansible, or similar technologies

Working knowledge of the Risk Management Framework (RMF), DISA STIGs

Preferred

Previous work experience providing support to the NGEN-NMCI program

Experience with Infrastructure as Code (IaC) tools such as Terraform, Ansible, or CloudFormation for automating test environments

ITILv4, Scrum Master, or Agile SAFe certification(s) or applicable experience

Company

Leidos

Glassdoor3.9

Leidos is a Fortune 500® innovation company rapidly addressing the world’s most vexing challenges in national security and health.

Founded in 1969

Reston, Virginia, USA

10001+ employees

https://www.leidos.com/

Funding

Current Stage

Public Company

Total Funding

unknown

2025-02-20Post Ipo Debt

2013-09-17IPO

Leadership Team

James Carlini

Chief Technology Officer

Theodore Tanner

Chief Technology Officer

Recent News

Fortune

Fortune 500 Power Moves: Which executives gained and lost power this week

2025-12-20

MarketScreener

Leidos Names Theodore Tanner Chief Technology Officer

2025-12-16

Benzinga.com

Citigroup, Leidos Holdings And More On CNBC's 'Final Trades'

2025-12-16

Company data provided by crunchbase