Cadre5 · 1 week ago
Senior HPC Linux Systems Engineer
Cadre5 is a company founded in 1999 in East Tennessee, providing innovative technical solutions. They are seeking a Senior HPC Linux Systems Engineer to improve the security, performance, and reliability of the NCCS computing environments, including support for one of the fastest supercomputers in the world.
ComputerSoftware
Responsibilities
Install, integrate, and administer HPC Linux clusters and high-speed networks
Diagnosing system operational problems quickly and effectively
Coordinating with vendors to resolve hardware and software problems
Recommending, planning, and coordinating hardware and software changes with customer participation using
Change management processes
Porting and writing system management tools
Documenting system administration procedures for routine and complex tasks
Participating in a 24-hour, 7-day on-call support rotation and off-hours maintenance windows
System implementation/integration into the NCCS environment and systems performance analysis
Lead system deployment, integration and troubleshooting of a large-scale computer system
Participate in relevant systems topics with the internal and external community of peers contributing experiences
And solutions
Mentor junior-level staff as they join the group
Qualification
Required
Bachelor's Degree in a scientific or technical field with a combination of 8+ years of Linux systems experience is required. An equivalent combination of education and experience will be considered
The ability to obtain and maintain a Department of Energy 'Q' clearance is required. This requires US Citizenship
Preferred
Experience managing Linux operating systems in a large-scale system environment
Solid understanding of networked computing environment concepts
Experience with Linux Cluster Administration
Ability to develop and maintain programs and scripts that aid in the operation and automation of administrative tasks using various shell and scripting languages (bash, Python, Go)
Experience with Lustre and GPFS file systems
Experience with batch schedulers (particularly SLURM)
Experience deploying and maintaining automated configuration management software such as Puppet
Strong interpersonal and communication skills
Ability to work as a team player
Proactive and solution-oriented problem solver
Prior project and/or team leadership experience
Benefits
Excellent medical insurance, including employer-paid benefits
401K match
15 days PTO
10 holidays