High Performance Computing (HPC) / Linux Systems Engineer, Senior jobs in United States
info-icon
This job has closed.
company-logo

LCG, Inc. · 1 month ago

High Performance Computing (HPC) / Linux Systems Engineer, Senior

LCG, Inc. is seeking a highly skilled and motivated Senior High Performance Computing (HPC) / Linux Systems Engineer to support their Client. This role is critical in administering and optimizing the institute’s HPC infrastructure that supports bioinformatics workloads, genomic data processing, and scientific research computing needs.

Health CareInformation Technology
check
Growth Opportunities
badNo H1BnoteU.S. Citizen Onlynote

Responsibilities

Administer, monitor, and maintain HPC cluster systems, enterprise storage (Dell EMC Isilon, Unity), and associated applications
Provide day-to-day systems administration for RedHat, CentOS, SUSE, and Oracle Linux environments, including installing, configuring, and maintaining Red Hat and CentOS systems in enterprise HPC environments
Manage and optimize HPC clusters using Bright Cluster Manager to ensure reliability, scalability, and performance
Support and manage HPC job scheduling workflows, including Altair/Univa Grid Engine and equivalent schedulers
Troubleshoot complex HPC infrastructure issues in collaboration with client teams, vendors, and NIH partners
Lead and support infrastructure enhancement projects, system upgrades, and modernization initiatives
Maintain and configure enterprise backup and disaster recovery solutions (Cohesity, VMware, Isilon OneFS)
Perform data backup/restore operations across on-prem and cloud (AWS/Azure) storage systems and participate in quarterly and annual DR tests
Administer NAS, iSCSI, FC SAN storage devices, and support Dell, EMC, and Isilon hardware platforms
Perform data restoration from Dell PowerVault Tape Libraries and manage tape-based backup workflows
Maintain and troubleshoot enterprise file systems including XFS, NFS, SMB, and CIFS
Support Infrastructure as Code efforts and modernization using containers, Kubernetes, and cloud automation
Utilize Ansible or Puppet configuration management tools to ensure consistency and repeatability across systems
Write and maintain automation scripts using Shell, Perl, or Python for system administration and HPC workflows
Ensure security of HPC and Linux systems by addressing vulnerabilities, updating configurations, and maintaining NIH/HHS compliance
Implement and manage security and vulnerability tools such as BigFix, Carbon Black, and Nessus
Support secure system configurations in alignment with CIS Benchmarks, NIST 800-53, and NIH standards
Administer DNS and firewall configurations for HPC and Linux environments
Manage user accounts, permissions, and data access across HPC clusters, Active Directory, cloud platforms, and hybrid environments
Administer and troubleshoot DNS, AD integration, and firewall rules to support HPC operations
Support scientific computing and bioinformatics workflows, including installation and management of user-requested tools and software
Provide expert support in compiling, installing, and testing open-source and commercial scientific applications across HPC environments
Collaborate with researchers to ensure optimized performance for complex workloads
Prepare comprehensive documentation including SOPs, system diagrams, contingency plans, and COOP/DR readiness materials
Contribute to incident, configuration, and change management practices following ITIL principles
Conduct research and present quarterly innovation updates to leadership
Participate in requirements gathering with technical and non-technical stakeholders and provide expert recommendations
Coordinate vendor demonstrations, tool evaluations, and strategic technology assessments
Perform hands-on datacenter tasks such as racking/un-racking, hardware installation, and diagnostics

Qualification

Linux systems administrationHigh Performance Computing (HPC)Cloud computing AWSCloud computing AzureStorage systems managementRed Hat Certified EngineerVMware VCPAWS/Azure CertificationsSecurity+ certificationITIL 4 certificationLinux shell scriptingHPC job schedulersActive DirectoryNetworked storageMonitoring toolsSecure system configurationHybrid infrastructure supportBioinformatics experienceGenomic data analysisKubernetes familiarityDocumentation skillsCommunication skills

Required

Bachelor's degree in Computer Science, Information Technology, or a related technical field (or equivalent experience)
7+ years of experience in systems engineering and administration in Linux-based environments
Demonstrated experience supporting high-performance computing environments, scientific workloads, and large-scale storage systems
Required Certifications : Red Hat Certified Engineer or Red Hat Certified System Administrator (RHCSA), VMware VCP, AWS/Azure Certifications, Security +, ITIL 4
Experience with Apache, MySQL/Postgres, and additional enterprise filesystems (XFS, SMB, CIFS)
Experience compiling, installing, and testing open-source or COTS scientific applications
Proficiency in Linux shell scripting and tools (Bash, Python, etc.)
Experience with HPC job schedulers (e.g., Univa Grid Engine or equivalent)
Experience with VMware vSphere, vCenter, vSRM, and ESXi
Experience with Active Directory and Group Policy Objects (GPO)
Experience with Networked storage: NAS/SAN, tape libraries, and backup platforms
Experience with Monitoring tools (e.g., Dynatrace, VMware vROps)
Experience with secure system configuration and compliance frameworks (e.g., CIS benchmarks, NIST 800-53, NIH security handbook)
Experience supporting hybrid infrastructure environments (on-premises, AWS, and Azure)
Strong documentation and communication skills with the ability to convey complex technical information to non-technical stakeholders

Preferred

Dell DCA/DCS certifications (desired)
Experience in a bioinformatics or life sciences research environment
Knowledge of genomic data analysis pipelines and tools used in Next-Generation Sequencing (NGS)
Familiarity with system modernization practices including Kubernetes, HCI, VDI/DaaS, and cloud-native technologies

Benefits

Health insurance options (medical, dental, vision)
Life and disability insurance
Retirement plan contributions
Paid leave
Federal holidays
Professional development
Lifestyle benefits

Company

LCG, Inc.

twittertwittertwitter
company-logo
LCG is an information technology company specializing in scientific research support, grants management, and health IT services.

Funding

Current Stage
Growth Stage

Leadership Team

leader-logo
Melissa McCullough
Executive Vice President and CFO
linkedin
leader-logo
Carey Parrett, MBA
Vice President and Chief Delivery Officer
linkedin
Company data provided by crunchbase