Systems Reliability Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Leidos · 5 days ago

Systems Reliability Engineer

Leidos is a company focused on providing innovative solutions in various domains, and they are seeking a Systems Reliability Engineer. The role involves troubleshooting system incidents, analyzing performance, and implementing automated solutions to enhance system reliability.

ComputerGovernmentInformation ServicesInformation TechnologyNational SecuritySoftware
badNo H1BnoteSecurity Clearance RequirednoteU.S. Citizen Onlynote

Responsibilities

Troubleshoot and resolve system/operational incidents
Perform root cause analysis for operational incidents
Analyze system performance and take corrective actions as needed
Coordinate with mission partners, consumer applications, and other external entities in troubleshooting enterprise incidents and integration problems
Design, develop, and implement automated solutions to proactively monitor system health, identify performance bottlenecks, and resolve system issues through automated remediation, reducing manual intervention and improving system reliability
Collect data, identify and analyze trends in Operational Incidents, and provide suggestions to mitigate common issues
Work closely with Ops Tech Lead and Development Lead to identify baseline enhancements to improve operational stability
Work with deployment and ISP teams to support baseline deployments to operations
Willingness to support off-hour calls to assist in troubleshooting when high priority operational incidents occur

Qualification

Oracle IdentityAccess ManagementCOTS integrationLinux scriptingPythonKubernetesDevOps toolsSecurity+ certificationCommunication skillsProblem solving skillsInterpersonal skillsSelf-starter

Required

BS degree and 4+ years of prior relevant experience or Masters with 2+ years of prior relevant experience
Requires a TS/SCI and ability to obtain and maintain a Polygraph post hire
Strong communication skills, both verbal and written
Ability to quickly learn new software and IT concepts
Strong problem solving and decision making skills
Self-starter with an ability to work in a team environment and independently
Intimately familiar with the COTS products that the program leverages: Oracle Identity and Access Management (IdAM) suite, Apache webgates, and Computer Associates (CA) API Gateway
Experience scripting in a Linux environment using Shell and Bash
Deep understanding and background in COTS integration and custom code development
Experience in at least one of the following languages: Bash, Python, Java, NodeJS
Local to DMV (DC/Maryland/Virginia) with ability to be physically present at the team's work location in Chantilly
Strong interpersonal skills and proven track record of leading technical teams, conveying technical solutions to technical and non-technical audiences
Candidate must be able to physically be in Chantilly, VA a minimum of 5 days a week to work with the team with occasional meetings in Reston and/or Springfield, VA
All candidates must be US CITIZENS to be considered for the position
Security+ certification within 60 days of hire

Preferred

Kubernetes experience using Rancher RKE2 or Openshift
Strong understanding of containers
Experience containerizing existing custom software
Knowledge of common DevOps tools such as: Ansible, ArgoCD, Gitlab, Nexus3, Kubernetes
Certifications in any of the following: RHCSA/RHCE, AWS Solutions Architect/DevOps Engineer, CKA/CKAD
Familiarity with modern authentication flows such as SAML, OAuth2 and OIDC

Company

Leidos is a Fortune 500® innovation company rapidly addressing the world’s most vexing challenges in national security and health.

Funding

Current Stage
Public Company
Total Funding
unknown
2025-02-20Post Ipo Debt
2013-09-17IPO

Leadership Team

leader-logo
James Carlini
Chief Technology Officer
linkedin
leader-logo
Theodore Tanner
Chief Technology Officer
linkedin
Company data provided by crunchbase