Oak Ridge National Laboratory · 21 hours ago
HPC Linux Systems Engineer
Oak Ridge National Laboratory (ORNL) is seeking highly motivated HPC Linux Systems Engineers to join teams operating advanced computing environments. The successful candidates will help architect, deploy, and maintain HPC systems that support scientific and mission objectives across various research areas.
Responsibilities
Install, integrate, and administer Linux-based HPC clusters, storage systems, and high-speed networks
Monitor and optimize system performance, reliability, and scalability for large-scale computational workloads
Diagnose complex hardware and software issues, coordinating with vendors and internal engineering teams to implement solutions
Participate in system design, deployment, acceptance testing, and upgrades for leadership-class and research computing systems
Develop and maintain automation, configuration management, and monitoring solutions using tools such as Ansible, Puppet, Bash, or Python
Collaborate with scientists, researchers, and technical staff to ensure HPC resources effectively support scientific and mission objectives
Support identity management, authentication, and access control frameworks to maintain secure and compliant environments
Document system architectures, processes, and best practices, and contribute to internal knowledge sharing
Participate in on-call rotations and off-hours maintenance windows as required to support 24x7 operations
Qualification
Required
Bachelor's degree in computer science, engineering, or a related field
A minimum of 5 years of experience in Linux systems administration, or an equivalent combination of education and experience
Preferred
Experience administering HPC clusters or large-scale Linux computing environments
Familiarity with batch schedulers (e.g., SLURM, PBS, LSF) and parallel file systems (Lustre, GPFS/Spectrum Scale)
Experience implementing and managing automation and configuration management frameworks (Ansible, Puppet, Salt)
Proficiency in scripting or programming (Python, Bash, Go)
Understanding of networking fundamentals and high-speed interconnects (InfiniBand, Ethernet)
Experience deploying or supporting identity management and multi-factor authentication systems (PingFederate, RSA SecureID, Entra ID)
Familiarity with virtualization or containerization technologies (VMware, KVM, Podman, Apptainer)
Experience troubleshooting and tuning high-performance storage, networking, and compute systems
Excellent communication, collaboration, and problem-solving skills
Demonstrated ability to lead or contribute to complex technical projects with minimal supervision
Benefits
Work on the world’s most powerful supercomputers, including Frontier , the first system to achieve exascale performance.
Enable breakthrough science in fields like fusion energy, climate modeling, AI, and national security.
Collaborate with diverse teams of scientists, engineers, and technologists from across the DOE complex and academia.
Grow your career in a mission-driven, innovation-focused environment with access to professional development and leadership opportunities.
Enjoy life in East Tennessee, with a thriving research community, scenic outdoor recreation, and a high quality of life.
Company
Oak Ridge National Laboratory
Oak Ridge National Laboratory holds a range of R&D assignments, from fundamental nuclear physics to applied R&D on advanced energy systems.
Funding
Current Stage
Late StageTotal Funding
$9.8MKey Investors
US Department of Energy
2023-09-21Grant· $4.8M
2023-07-27Grant
2022-03-14Grant· $5M
Leadership Team
Recent News
2026-01-03
2025-12-13
Company data provided by crunchbase