COLSA · 13 hours ago
Linux/HPC Systems Engineer
COLSA is a company focused on delivering mission-critical solutions and they are seeking a Linux/HPC Systems Engineer. The role involves managing and optimizing high-performance computing systems, ensuring infrastructure stability, and implementing advanced security measures.
Cyber SecurityInformation TechnologySoftware
Responsibilities
Architect & Deploy: Lead the design and lifecycle management of mission-critical Linux workstations, enterprise-grade servers, and high-performance computing (HPC) clusters
Engineer Filesystems: Master the art of data movement. Administer complex local and distributed filesystems (Lustre, GPFS/Spectrum Scale) to ensure extreme-speed access across the fabric
Infrastructure as Code (IaC): Treat the data center as a codebase. Develop sophisticated automation workflows using Python, Bash, and Ansible to eliminate manual toil and ensure drift-free configurations
Defensive Engineering: Implement "Hardened by Design" security. Fine-tune SELinux policies and advanced firewall configurations to protect sensitive data without sacrificing computational performance
Container Orchestration: Modernize scientific workflows by deploying and managing isolated environments using Podman while working to establish a Kubernetes environment
HPC Performance Tuning: Push the limits of the silicon. Optimize cluster scheduling and management utilizing industry-leading tools like Bright Cluster Manager and Slurm
Low-Latency Networking: Configure and optimize high-bandwidth networking, including InfiniBand fabrics, for seamless inter-node communication
Technical Documentation: Author high-fidelity playbooks and strategic architectural diagrams that serve as the blueprint for our evolving infrastructure
Qualification
Required
Bachelor's Degree in related field or equivalent high-level professional experience in mission-critical environments
Minimum of 1 to 10 years of related experience
U.S. Citizenship required: Active DoD Top Secret security clearance with eligibility for SCI along with successful completion of CI Scope Polygraph within 180 days of hire
Ability and willingness to obtain and maintain Special Access Program (SAP) eligibility
Active DoD 8570.01-M baseline certification (Security+ CE, SSCP, or equivalent)
Deep-tier professional experience in Linux systems engineering (RHEL/Rocky preferred)
Preferred
Active TS/SCI clearance with a current CI Polygraph
Advanced Certification: RHCE, RHCSA, or similar
Direct experience tuning kernel parameters and MPI libraries for large-scale distributed computing
Expertise in VMware, Nutanix, or KVM within a heterogeneous environment that includes Windows integration
Company
COLSA
COLSA's full-scale capabilities include cyber and information warfare, rapid prototyping and engineering, uncrewed systems, acquisition, logistics, studies and analysis, data science, and systems and software engineering.