LCG, Inc. · 1 month ago
High Performance Computing (HPC) / Linux Systems Engineer, Senior
LCG, Inc. is seeking a highly skilled and motivated Senior High Performance Computing (HPC) / Linux Systems Engineer to support their Client. This role is critical in administering and optimizing the institute’s HPC infrastructure that supports bioinformatics workloads, genomic data processing, and scientific research computing needs.
Health CareInformation Technology
Responsibilities
Administer, monitor, and maintain HPC cluster systems, enterprise storage (Dell EMC Isilon, Unity), and associated applications
Provide day-to-day systems administration for RedHat, CentOS, SUSE, and Oracle Linux environments, including installing, configuring, and maintaining Red Hat and CentOS systems in enterprise HPC environments
Manage and optimize HPC clusters using Bright Cluster Manager to ensure reliability, scalability, and performance
Support and manage HPC job scheduling workflows, including Altair/Univa Grid Engine and equivalent schedulers
Troubleshoot complex HPC infrastructure issues in collaboration with client teams, vendors, and NIH partners
Lead and support infrastructure enhancement projects, system upgrades, and modernization initiatives
Maintain and configure enterprise backup and disaster recovery solutions (Cohesity, VMware, Isilon OneFS)
Perform data backup/restore operations across on-prem and cloud (AWS/Azure) storage systems and participate in quarterly and annual DR tests
Administer NAS, iSCSI, FC SAN storage devices, and support Dell, EMC, and Isilon hardware platforms
Perform data restoration from Dell PowerVault Tape Libraries and manage tape-based backup workflows
Maintain and troubleshoot enterprise file systems including XFS, NFS, SMB, and CIFS
Support Infrastructure as Code efforts and modernization using containers, Kubernetes, and cloud automation
Utilize Ansible or Puppet configuration management tools to ensure consistency and repeatability across systems
Write and maintain automation scripts using Shell, Perl, or Python for system administration and HPC workflows
Ensure security of HPC and Linux systems by addressing vulnerabilities, updating configurations, and maintaining NIH/HHS compliance
Implement and manage security and vulnerability tools such as BigFix, Carbon Black, and Nessus
Support secure system configurations in alignment with CIS Benchmarks, NIST 800-53, and NIH standards
Administer DNS and firewall configurations for HPC and Linux environments
Manage user accounts, permissions, and data access across HPC clusters, Active Directory, cloud platforms, and hybrid environments
Administer and troubleshoot DNS, AD integration, and firewall rules to support HPC operations
Support scientific computing and bioinformatics workflows, including installation and management of user-requested tools and software
Provide expert support in compiling, installing, and testing open-source and commercial scientific applications across HPC environments
Collaborate with researchers to ensure optimized performance for complex workloads
Prepare comprehensive documentation including SOPs, system diagrams, contingency plans, and COOP/DR readiness materials
Contribute to incident, configuration, and change management practices following ITIL principles
Conduct research and present quarterly innovation updates to leadership
Participate in requirements gathering with technical and non-technical stakeholders and provide expert recommendations
Coordinate vendor demonstrations, tool evaluations, and strategic technology assessments
Perform hands-on datacenter tasks such as racking/un-racking, hardware installation, and diagnostics
Qualification
Required
Bachelor's degree in Computer Science, Information Technology, or a related technical field (or equivalent experience)
7+ years of experience in systems engineering and administration in Linux-based environments
Demonstrated experience supporting high-performance computing environments, scientific workloads, and large-scale storage systems
Required Certifications : Red Hat Certified Engineer or Red Hat Certified System Administrator (RHCSA), VMware VCP, AWS/Azure Certifications, Security +, ITIL 4
Experience with Apache, MySQL/Postgres, and additional enterprise filesystems (XFS, SMB, CIFS)
Experience compiling, installing, and testing open-source or COTS scientific applications
Proficiency in Linux shell scripting and tools (Bash, Python, etc.)
Experience with HPC job schedulers (e.g., Univa Grid Engine or equivalent)
Experience with VMware vSphere, vCenter, vSRM, and ESXi
Experience with Active Directory and Group Policy Objects (GPO)
Experience with Networked storage: NAS/SAN, tape libraries, and backup platforms
Experience with Monitoring tools (e.g., Dynatrace, VMware vROps)
Experience with secure system configuration and compliance frameworks (e.g., CIS benchmarks, NIST 800-53, NIH security handbook)
Experience supporting hybrid infrastructure environments (on-premises, AWS, and Azure)
Strong documentation and communication skills with the ability to convey complex technical information to non-technical stakeholders
Preferred
Dell DCA/DCS certifications (desired)
Experience in a bioinformatics or life sciences research environment
Knowledge of genomic data analysis pipelines and tools used in Next-Generation Sequencing (NGS)
Familiarity with system modernization practices including Kubernetes, HCI, VDI/DaaS, and cloud-native technologies
Benefits
Health insurance options (medical, dental, vision)
Life and disability insurance
Retirement plan contributions
Paid leave
Federal holidays
Professional development
Lifestyle benefits
Company
LCG, Inc.
LCG is an information technology company specializing in scientific research support, grants management, and health IT services.