Senior HPC Engineer/Administrator (IMC - 001) jobs in United States
cer-icon
Apply on Employer Site
company-logo

SageCor Solutions · 3 weeks ago

Senior HPC Engineer/Administrator (IMC - 001)

SageCor Solutions is a growing company specializing in engineering services and high performance computing. The Senior HPC Engineer/Administrator will be responsible for managing and providing technical support for HPC systems, ensuring system integrity, and optimizing operations in a research-driven environment.

HardwareInformation TechnologySoftware
badNo H1BnoteSecurity Clearance RequirednoteU.S. Citizen Onlynote

Responsibilities

Configure and manage Linux and Windows (or other applicable) operating systems and installs/loads operating system software, troubleshoot, maintain integrity of and configure network components, along with implementing operating systems enhancements to improve security, reliability, and performance
Administer, monitor, and maintain HPC systems, including compute nodes, storage, networking, and software stacks
Provide support to IT systems including day-to-day operations, monitoring and problem resolution for all of the client/server/storage/network devices, mobile devices, etc
Implement and maintain automation tools for system provisioning, configuration management, and monitoring
Provide support for implementation, troubleshooting and maintenance of IT systems
Manage the daily activities of configuration and operation of IT systems
Provide assistance to users in accessing and using IT systems
Optimize system operations and resource utilization, and perform system capacity analysis and planning
Provide in-depth experience in trouble-shooting IT systems
Analyze and resolve complex problems associated with server hardware, applications and software integration
Contribute to performance benchmarking, system tuning, and capacity planning
Support researchers by providing technical expertise and resolving IT-related roadblocks or issues
Document system administration procedures and contribute to knowledge-sharing initiatives

Qualification

Linux administrationHPC systems managementScripting CScripting PythonVPN configurationSystem automation (Ansible)Containerization (Docker)Technical supportProblem resolutionDocumentation

Required

Active TS/SCI W/ Polygraph Required
Configure and manage Linux and Windows (or other applicable) operating systems and installs/loads operating system software, troubleshoot, maintain integrity of and configure network components, along with implementing operating systems enhancements to improve security, reliability, and performance
Administer, monitor, and maintain HPC systems, including compute nodes, storage, networking, and software stacks
Provide support to IT systems including day-to-day operations, monitoring and problem resolution for all of the client/server/storage/network devices, mobile devices, etc
Implement and maintain automation tools for system provisioning, configuration management, and monitoring
Provide support for implementation, troubleshooting and maintenance of IT systems
Manage the daily activities of configuration and operation of IT systems
Provide assistance to users in accessing and using IT systems
Optimize system operations and resource utilization, and perform system capacity analysis and planning
Provide in-depth experience in trouble-shooting IT systems
Analyze and resolve complex problems associated with server hardware, applications and software integration
Contribute to performance benchmarking, system tuning, and capacity planning
Support researchers by providing technical expertise and resolving IT-related roadblocks or issues
Document system administration procedures and contribute to knowledge-sharing initiatives
Experience administering Linux-based servers and HPC clusters, including job schedulers (e.g., Slurm, LSF, PBS)
Experience configuring and managing Virtual Private Network (VPN) clients and servers
Scripting/programming skills (C and Python)
Knowledge of: System automation tools (e.g., Ansible)
Knowledge of: System provisioning tools (e.g., Warewolf)
Knowledge of: Distributed storage systems (e.g., Lustre, BeeGFS)
Knowledge of: Containerization (e.g., Docker, Apptainer)
Knowledge of: Installing, maintaining and using infrastructure and performance monitoring and optimization tools (e.g., Grafana, Prometheus)
Knowledge of: Setting up and executing benchmarks in an HPC environment and analyzing their results systematically

Preferred

Preferably meets DoD 8140.01 or DoD 8570.01-M training and certification requirements

Company

SageCor Solutions

twittertwitter
company-logo
Sagecor Solutions provides systems engineering, hardware design, and software design services.

Funding

Current Stage
Early Stage

Leadership Team

leader-logo
Adam Smith
Founder
linkedin
leader-logo
Josh Weil
Talent Acquisition Partner
linkedin
Company data provided by crunchbase