Sr. HPC System Administrator jobs in United States
cer-icon
Apply on Employer Site
company-logo

University of Chicago · 5 months ago

Sr. HPC System Administrator

The University of Chicago is seeking a highly qualified Senior HPC System Administrator to join the system and operation team that builds and manages RCC HPC systems and facility operations. The individual will be responsible for designing automated solutions, managing HPC hardware and software, and ensuring the security and performance of the systems.

Higher Education
check
H1B Sponsor Likelynote

Responsibilities

Installing, configuring, and maintaining large computer clusters/servers and software
Day-to-day operations of the systems including systems administration, monitoring and storage performance up to and including network components. Management of the system’s network switch, parallel file system and HPC software stack and tools
Configuration of the scheduling and queuing system
Diagnosing and resolving system operational problems quickly and effectively. Coordinating with vendors to resolve hardware and software problems. Assist users with access and other help desk ticket requests or issues
Use scripting/programming skills to enable system-level automation, problem detection, security maintenance and patch management
Building and deploying open-source software and software from vendors/partners
Providing reliable and efficient backups/restores for all managed systems
Documenting system administration procedures for routine and complex tasks
Maintaining and monitoring the security of the HPC systems and servers
Plans and installs necessary patches and upgrades for servers and their associated storage, network, communications, and peripheral sub-systems. Installs and maintains an appropriate level of intrusion detection, monitoring, and auditing software as required
Tracks compliance and maintains documentation for hardware, software, and service inventories for management reports
Performs other related work as needed

Qualification

Linux system administrationHPC cluster supportJob management toolsDistributed file systemsSystems automation toolsNetwork storage subsystemsScientific application softwareAnalytical skillsProblem-solving abilityTeam collaboration

Required

Minimum requirements include a college or university degree in related field
Minimum requirements include knowledge and skills developed through 5-7 years of work experience in a related job discipline
Installing, configuring, and maintaining large computer clusters/servers and software
Day-to-day operations of the systems including systems administration, monitoring and storage performance up to and including network components
Management of the system's network switch, parallel file system and HPC software stack and tools
Configuration of the scheduling and queuing system
Diagnosing and resolving system operational problems quickly and effectively
Coordinating with vendors to resolve hardware and software problems
Assist users with access and other help desk ticket requests or issues
Use scripting/programming skills to enable system-level automation, problem detection, security maintenance and patch management
Building and deploying open-source software and software from vendors/partners
Providing reliable and efficient backups/restores for all managed systems
Documenting system administration procedures for routine and complex tasks
Maintaining and monitoring the security of the HPC systems and servers
Plans and installs necessary patches and upgrades for servers and their associated storage, network, communications, and peripheral sub-systems
Tracks compliance and maintains documentation for hardware, software, and service inventories for management reports
Performing other related work as needed

Preferred

Master's degree in Computer Science or closely related field
Full time Linux system administration experience in a large distributed computing environment
Previous experience in providing support for Linux HPC cluster used for scientific research
Experience with installing, configuring, and maintaining job management tools (such as SLURM, Moab, TORQUE, PBS, etc.)
Experience configuring, installing and troubleshooting MPI and OpenMP
Experience with operating system deployment tools (e.g. XCAT, ROCKS)
Experience configuring, administering, and supporting network storage subsystems (e.g. IBM, NetAppl DataDirect Network, LSI, etc.)
Hands-on experience of at least one distributed file system (Spectrum Scale-GPFS, Lustre, BeeGFS, Gluster, IMRIX, PVFS, etc.)
Direct experience working with Infiniband (must at least be able to demonstrate a working knowledge of Infiniband concepts, OFED layers, sub-net managers)
Experience configuring, installing, tuning and maintaining scientific application software on large-scale systems
Experience supporting HPC compilers and libraries
Experience with systems automation tools such as Ansible or Puppet
Experience configuring, installing, maintaining and/or using performance monitoring and optimization tools
Ability to work well with faculty and researchers
Ability to identify and gain expertise in appropriate new technologies and/or software tools
Ability to function as part of an interactive team while demonstrating self-initiative to achieve project's goals and Research Computing Center's mission
Strong analytical skills and problem-solving ability

Benefits

The University of Chicago offers a wide range of benefits programs and resources for eligible employees, including health, retirement, and paid time off.

Company

University of Chicago

company-logo
One of the world’s great intellectual destinations, the University of Chicago empowers scholars and students to ask tough questions, cross disciplinary boundaries, and challenge conventional thinking to enrich human life around the globe.

H1B Sponsorship

University of Chicago has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (341)
2024 (318)
2023 (285)
2022 (233)
2021 (179)
2020 (172)

Funding

Current Stage
Late Stage

Leadership Team

leader-logo
Benedicte Nolens
Distinguished Executive in Residence
linkedin
Company data provided by crunchbase