By Light Professional IT Services · 4 days ago
Lead Site Reliability Engineer
By Light Professional IT Services LLC readies warfighters and federal agencies with technology and systems engineered to connect, protect, and prepare individuals and teams for whatever comes next. The Lead Site Reliability Engineer will lead a technical team to ensure maximum uptime of a large private and hybrid cloud environment, overseeing infrastructure, hypervisors, and applications while fostering customer relationships and addressing system deficiencies proactively.
GovernmentInformation ServicesInformation Technology
Responsibilities
Lead a technical team of SREs to ensure maximum uptime of a large private and hybrid cloud environment
Oversight includes all core infrastructure, hypervisors, Kubernetes, and applications
Ensure continuity of operations for end users
Proactively find system deficiencies, prioritize backlogs, and lead the team to correct issues before they impact users
Coordinate with development engineering teams and follow program processes to ensure controlled changes to production systems
Identify and implement automation, observability, and configuration updates throughout the system
Work as part of a fast moving highly diverse team across multiple projects simultaneously
Must have experience leading a technical team of engineers, including performing supervisory duties to ensure successful team outcomes
Must be able to understand complex enterprise IT system architectures (hardware and software) with the ability to pinpoint the team’s focus areas and prioritizations to maximize impact
Must be a self-starter in a fast-paced environment and able to work with a multi-faceted technical team of engineers with a diverse set of skills at differing levels of experience
Establish processes and procedures for maintaining infrastructure configuration management across the enterprise to include automated system checks against the configuration managed baseline
Must be able to effectively manage processes associated with the team’s use of Git repositories to ensure proper configuration management
Analyzes real world problems and implements solutions according to corporate and government guidelines, procedures, and industry best practices
Assures system stability, accessibility, and proper configuration of assigned technical systems and components
Be sensitive and flexible to the needs and requirements of the customer
Must be comfortable with Linux system management, as this is the primary operating system, as well as other Red Hat enterprise services
Experience with VMware Cloud Foundations, and the overarching VMware hypervisor and management stack and services
Operate, and maintain physical switches, routers, IPS, IDS devices
Operate, and maintain a VMware NSX based SDN
Understand Kubernetes and micro-service based applications
Must be able to author and maintain scripts and automation in a managed Git repository
Proactively identify, locate, mitigate, and resolve system issues
Document and present performance records and results
Document problems/issues via Jira based ticketing system/tracking log
Review and update all information security management system process and procedures (data/software)
Qualification
Required
Must have experience leading a technical team of engineers, including performing supervisory duties to ensure successful team outcomes
Must be able to understand complex enterprise IT system architectures (hardware and software) with the ability to pinpoint the team's focus areas and prioritizations to maximize impact
Must be a self-starter in a fast-paced environment and able to work with a multi-faceted technical team of engineers with a diverse set of skills at differing levels of experience
Establish processes and procedures for maintaining infrastructure configuration management across the enterprise to include automated system checks against the configuration managed baseline
Must be able to effectively manage processes associated with the team's use of Git repositories to ensure proper configuration management
Analyzes real world problems and implements solutions according to corporate and government guidelines, procedures, and industry best practices
Assures system stability, accessibility, and proper configuration of assigned technical systems and components
Be sensitive and flexible to the needs and requirements of the customer
Must be comfortable with Linux system management, as this is the primary operating system, as well as other Red Hat enterprise services
Experience with VMware Cloud Foundations, and the overarching VMware hypervisor and management stack and services
Operate, and maintain physical switches, routers, IPS, IDS devices
Operate, and maintain a VMware NSX based SDN
Understand Kubernetes and micro-service based applications
Must be able to author and maintain scripts and automation in a managed Git repository
Proactively identify, locate, mitigate, and resolve system issues
Document and present performance records and results
Document problems/issues via Jira based ticketing system/tracking log
Review and update all information security management system process and procedures (data/software)
Bachelor's degree in a technical discipline such as computer science or information technology from an accredited college or university. Will consider additional degrees with accompanied technical leadership experience
Ten years of work experience preferred
Security+ certifications are required or must be completed within six months of hire
Preferred
Experience with Ansible (or similar) infrastructure automation to deploy and configure standard baselines for physical devices and in a virtual environment (VMware and Linux)
Utilize scripting for automating tasks, administration, data collection and reporting. This job requires the ability to write, test, debug, deploy, and maintain scripts
Ability to audit and report system and performance logs
Ability to automate, administer and maintain the environment through configuration management, version control, and backups with the use of scripting
Ability to respond professionally, effectively, and efficiently to service requests
Ability to prioritize multiple tasks, projects, and demands
Ability to research and/or implement new technologies
Effective interpersonal and communications skills
Professionally convey system-wide performance information routinely via tools such as PowerPoint, Excel, Visio, etc
Train others to perform similar design and administrative tasks
Interact with vendors/users/customers and developers to understand needs and operational requirements that will impact development and testing activities
Examine any relevant change implementation, then report the changes to developers and testers welcoming feedback for future improvements
Ability to solve technical problems involving a variety of integrated software and hardware platforms
Knowledge of or experience with the DoD Risk Management Framework (RMF) and National Institute of Standards and Technology (NIST) or similar best practices and security guidelines
Ability to assist others with getting certifications, such as providing guidance, mentoring, sandboxes, or cooperation
Company
By Light Professional IT Services
BY LIGHT Professional IT Services is a provider of IT, cloud, cyber and infrastructure solutions to the US Federal Government.
Funding
Current Stage
Late StageTotal Funding
unknown2017-05-31Acquired
Recent News
2026-01-07
2025-12-01
2025-10-09
Company data provided by crunchbase