Oak Ridge National Laboratory · 1 day ago
Senior HPC Linux Systems Engineer
Oak Ridge National Laboratory (ORNL) is seeking a Senior HPC Linux Systems Engineer to serve as a technical leader supporting some of the most advanced computing environments in the world. The role involves leading the design, implementation, and optimization of complex HPC infrastructure while managing large-scale technical projects and serving as a trusted advisor to scientific and operational leadership.
Advanced MaterialsClean EnergyEnergyEnergy ManagementManufacturingNuclearRenewable Energy
Responsibilities
Provide technical leadership in the design, integration, and administration of large-scale Linux-based HPC clusters, high-speed networks, and storage systems
Lead medium to large technical projects, coordinating requirements, schedules, and deliverables across internal and external stakeholders
Architect and deploy advanced infrastructure solutions supporting exascale-class and mission-critical computing environments
Serve as a technical mentor for HPC engineers, guiding best practices in automation, performance tuning, and system security
Develop, implement, and maintain configuration management and automation frameworks (e.g., Ansible, Puppet, Salt) to enhance reliability and reproducibility
Perform advanced system performance analysis, troubleshooting, and optimization, ensuring system scalability and long-term sustainability
Manage critical vendor and partner relationships, representing ORNL’s technical requirements during procurement, integration, and system acceptance
Contribute to strategic planning and technology roadmaps, influencing unit goals and technical direction
Collaborate closely with scientists, researchers, and IT specialists to align infrastructure capabilities with research and security objectives
Ensure compliance with DOE cybersecurity standards, configuration baselines, and operational policies
Author technical documentation, present internal briefings, and communicate complex issues and resolutions to management and stakeholders
Participate in on-call rotations, maintenance windows, and incident response as needed to support 24x7 operations
Qualification
Required
Bachelor's degree in computer science, engineering, or a related technical field
A minimum of 8 years of relevant experience in Linux systems administration or HPC systems engineering
Preferred
Demonstrated experience leading the design and deployment of HPC or large-scale distributed computing systems
Expertise with batch schedulers (SLURM, PBS, LSF) and parallel file systems (Lustre, GPFS/Spectrum Scale)
Proven ability to lead technical projects from concept through implementation, balancing technical depth with project delivery
Strong proficiency in automation and infrastructure-as-code frameworks (Ansible, Puppet, Salt)
Advanced scripting or programming skills (Python, Bash, Go) for automation and operational tooling
In-depth understanding of high-speed interconnects (InfiniBand, Slingshot, Ethernet) and storage architectures
Experience managing identity and access management systems, including MFA, SSO, and zero-trust frameworks (PingFederate, RSA SecureID, Entra ID)
Experience integrating virtualization or containerization solutions (VMware, KVM, Apptainer, Podman) into HPC environments
Ability to manage client and stakeholder relationships across multiple directorates and technical disciplines
Excellent written and verbal communication skills, including the ability to present complex technical concepts to diverse audiences
Proven ability to influence technical strategy and mentor staff in a collaborative research environment
Benefits
Work on the world’s most powerful supercomputers, including Frontier , the first system to achieve exascale performance.
Enable breakthrough science in fields like fusion energy, climate modeling, AI, and national security.
Collaborate with diverse teams of scientists, engineers, and technologists from across the DOE complex and academia.
Grow your career in a mission-driven, innovation-focused environment with access to professional development and leadership opportunities.
Enjoy life in East Tennessee, with a thriving research community, scenic outdoor recreation, and a high quality of life.
Company
Oak Ridge National Laboratory
Oak Ridge National Laboratory holds a range of R&D assignments, from fundamental nuclear physics to applied R&D on advanced energy systems.
Funding
Current Stage
Late StageTotal Funding
$9.8MKey Investors
US Department of Energy
2023-09-21Grant· $4.8M
2023-07-27Grant
2022-03-14Grant· $5M
Leadership Team
Recent News
2026-01-03
2025-12-13
Company data provided by crunchbase