HPC System Administrator jobs in United States
cer-icon
Apply on Employer Site
company-logo

University of Chicago · 5 days ago

HPC System Administrator

The University of Chicago is a leading research institution seeking a highly qualified HPC Systems Administrator to join their HPC Systems and Operations team. The role involves designing, deploying, configuring, and maintaining HPC clusters, ensuring the security and compliance of large-scale complex HPC systems primarily used for research.

Higher Education
check
H1B Sponsor Likelynote

Responsibilities

Design, deploy, configure, and administer HPC clusters, including management and compute nodes, storage infrastructure, interconnects (e.g., InfiniBand), and related systems
Develop, maintain, and enforce security procedures and system documentation for operational and compliance purposes
Implement infrastructure and security monitoring and detection systems to identify failures, unusual activity and respond to automated alerts
Tune, secure, and maintain the HPC job scheduling environment, including fair-sharing, accounting, and policy enforcement
Troubleshoot and resolve operational, performance, and security-related issues across HPC hardware and software stacks. Coordinate with hardware and software vendors to address defects, vulnerabilities, and performance issues. Assist Computational Scientists team with user support and helpdesk tickets, including elevated support for security-protected environments
Implement and maintain secure and reliable backup, archival, disaster-recovery, and restore capabilities for systems and research data
Perform vulnerability scanning, patch management, system and firmware updates across the infrastructure
Maintain complex system and network administration functions. Works with moderated guidance to administer simple systems and assists in the administration of larger systems
Maintains all supporting documentation for comprehensive operating system, hardware and software configuration. Monitors primary responses for information technology related security incidents and violations. Keeps current with new security and network monitoring technologies, applicable laws and regulations
Performs other related work as needed

Qualification

Linux administrationHPC cluster managementJob scheduler managementScripting (Python/Bash)Automation tools AnsibleAutomation tools PuppetMonitoring tools CheckMKMonitoring tools ZabbixDistributed storage systemsSecurity best practicesAnalytical skillsProblem-solving ability

Required

Minimum requirements include a college or university degree in related field
Minimum requirements include knowledge and skills developed through 2-5 years of work experience in a related job discipline
Design, deploy, configure, and administer HPC clusters, including management and compute nodes, storage infrastructure, interconnects (e.g., InfiniBand), and related systems
Develop, maintain, and enforce security procedures and system documentation for operational and compliance purposes
Implement infrastructure and security monitoring and detection systems to identify failures, unusual activity and respond to automated alerts
Tune, secure, and maintain the HPC job scheduling environment, including fair-sharing, accounting, and policy enforcement
Troubleshoot and resolve operational, performance, and security-related issues across HPC hardware and software stacks
Coordinate with hardware and software vendors to address defects, vulnerabilities, and performance issues
Assist Computational Scientists team with user support and helpdesk tickets, including elevated support for security-protected environments
Implement and maintain secure and reliable backup, archival, disaster-recovery, and restore capabilities for systems and research data
Perform vulnerability scanning, patch management, system and firmware updates across the infrastructure
Maintain complex system and network administration functions
Works with moderated guidance to administer simple systems and assists in the administration of larger systems
Maintains all supporting documentation for comprehensive operating system, hardware and software configuration
Monitors primary responses for information technology related security incidents and violations
Keeps current with new security and network monitoring technologies, applicable laws and regulations
Perform other related work as needed

Preferred

Linux system administration experience in a large, distributed computing environment
Demonstrated experience and knowledge of system security and best practices
Knowledge of Linux administration required, RHEL
Experience and advanced skills in scripting with Python or Bash
Experience installing, configuring, and managing job schedulers (e.g., Slurm, Torque, PBS, LSF)
Experience with automation tools such as Ansible, Puppet, Chef, Salt
Experience with provisioning tools (e.g., xCAT, Confluent, Warewulf)
Experience implementing monitoring tools (e.g., CheckMK, Zabbix, Nagios)
Knowledge of frameworks and federal regulations to protect regulated systems and data (e.g., HIPAA, FISMA, NIST CSF)
Experience working, documenting and enforcing controls required to protect controlled unclassified information (e.g., NIST 800-53, NIST 800-171, NIST SP 800-223, FIPS)
Knowledge of at least one distributed storage system (e.g., Storage Scale, Lustre, Gluster, BeeGFS, Ceph) and practical experience
Experience with InfiniBand (must at least be able to demonstrate a working knowledge of concepts)
Experience in writing precise and concise documentation, standard operating procedures
Understand and translate researchers' scientific goals into computational requirements
Work well with faculty and researchers
Identify and gain expertise in appropriate new technologies and/or software tools
Function as part of an interactive team while demonstrating self-initiative to achieve project's goals and Research Computing Center's mission
Strong analytical skills and problem-solving ability

Benefits

Health
Retirement
Paid time off

Company

University of Chicago

company-logo
One of the world’s great intellectual destinations, the University of Chicago empowers scholars and students to ask tough questions, cross disciplinary boundaries, and challenge conventional thinking to enrich human life around the globe.

H1B Sponsorship

University of Chicago has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (341)
2024 (318)
2023 (285)
2022 (233)
2021 (179)
2020 (172)

Funding

Current Stage
Late Stage

Leadership Team

leader-logo
Benedicte Nolens
Distinguished Executive in Residence
linkedin
Company data provided by crunchbase