University of Chicago · 5 days ago
HPC System Administrator
The University of Chicago is a leading research institution seeking a highly qualified HPC Systems Administrator to join their HPC Systems and Operations team. The role involves designing, deploying, configuring, and maintaining HPC clusters, ensuring the security and compliance of large-scale complex HPC systems primarily used for research.
Higher Education
Responsibilities
Design, deploy, configure, and administer HPC clusters, including management and compute nodes, storage infrastructure, interconnects (e.g., InfiniBand), and related systems
Develop, maintain, and enforce security procedures and system documentation for operational and compliance purposes
Implement infrastructure and security monitoring and detection systems to identify failures, unusual activity and respond to automated alerts
Tune, secure, and maintain the HPC job scheduling environment, including fair-sharing, accounting, and policy enforcement
Troubleshoot and resolve operational, performance, and security-related issues across HPC hardware and software stacks. Coordinate with hardware and software vendors to address defects, vulnerabilities, and performance issues. Assist Computational Scientists team with user support and helpdesk tickets, including elevated support for security-protected environments
Implement and maintain secure and reliable backup, archival, disaster-recovery, and restore capabilities for systems and research data
Perform vulnerability scanning, patch management, system and firmware updates across the infrastructure
Maintain complex system and network administration functions. Works with moderated guidance to administer simple systems and assists in the administration of larger systems
Maintains all supporting documentation for comprehensive operating system, hardware and software configuration. Monitors primary responses for information technology related security incidents and violations. Keeps current with new security and network monitoring technologies, applicable laws and regulations
Performs other related work as needed
Qualification
Required
Minimum requirements include a college or university degree in related field
Minimum requirements include knowledge and skills developed through 2-5 years of work experience in a related job discipline
Design, deploy, configure, and administer HPC clusters, including management and compute nodes, storage infrastructure, interconnects (e.g., InfiniBand), and related systems
Develop, maintain, and enforce security procedures and system documentation for operational and compliance purposes
Implement infrastructure and security monitoring and detection systems to identify failures, unusual activity and respond to automated alerts
Tune, secure, and maintain the HPC job scheduling environment, including fair-sharing, accounting, and policy enforcement
Troubleshoot and resolve operational, performance, and security-related issues across HPC hardware and software stacks
Coordinate with hardware and software vendors to address defects, vulnerabilities, and performance issues
Assist Computational Scientists team with user support and helpdesk tickets, including elevated support for security-protected environments
Implement and maintain secure and reliable backup, archival, disaster-recovery, and restore capabilities for systems and research data
Perform vulnerability scanning, patch management, system and firmware updates across the infrastructure
Maintain complex system and network administration functions
Works with moderated guidance to administer simple systems and assists in the administration of larger systems
Maintains all supporting documentation for comprehensive operating system, hardware and software configuration
Monitors primary responses for information technology related security incidents and violations
Keeps current with new security and network monitoring technologies, applicable laws and regulations
Perform other related work as needed
Preferred
Linux system administration experience in a large, distributed computing environment
Demonstrated experience and knowledge of system security and best practices
Knowledge of Linux administration required, RHEL
Experience and advanced skills in scripting with Python or Bash
Experience installing, configuring, and managing job schedulers (e.g., Slurm, Torque, PBS, LSF)
Experience with automation tools such as Ansible, Puppet, Chef, Salt
Experience with provisioning tools (e.g., xCAT, Confluent, Warewulf)
Experience implementing monitoring tools (e.g., CheckMK, Zabbix, Nagios)
Knowledge of frameworks and federal regulations to protect regulated systems and data (e.g., HIPAA, FISMA, NIST CSF)
Experience working, documenting and enforcing controls required to protect controlled unclassified information (e.g., NIST 800-53, NIST 800-171, NIST SP 800-223, FIPS)
Knowledge of at least one distributed storage system (e.g., Storage Scale, Lustre, Gluster, BeeGFS, Ceph) and practical experience
Experience with InfiniBand (must at least be able to demonstrate a working knowledge of concepts)
Experience in writing precise and concise documentation, standard operating procedures
Understand and translate researchers' scientific goals into computational requirements
Work well with faculty and researchers
Identify and gain expertise in appropriate new technologies and/or software tools
Function as part of an interactive team while demonstrating self-initiative to achieve project's goals and Research Computing Center's mission
Strong analytical skills and problem-solving ability
Benefits
Health
Retirement
Paid time off
Company
University of Chicago
One of the world’s great intellectual destinations, the University of Chicago empowers scholars and students to ask tough questions, cross disciplinary boundaries, and challenge conventional thinking to enrich human life around the globe.
H1B Sponsorship
University of Chicago has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (341)
2024 (318)
2023 (285)
2022 (233)
2021 (179)
2020 (172)
Funding
Current Stage
Late StageRecent News
2025-11-08
San Gabriel Valley Tribune
2025-10-23
Company data provided by crunchbase