Senior HPC Linux Systems Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Oak Ridge National Laboratory · 1 day ago

Senior HPC Linux Systems Engineer

Oak Ridge National Laboratory (ORNL) is seeking a Senior HPC Linux Systems Engineer to serve as a technical leader supporting some of the most advanced computing environments in the world. The role involves leading the design, implementation, and optimization of complex HPC infrastructure while managing large-scale technical projects and serving as a trusted advisor to scientific and operational leadership.

Advanced MaterialsClean EnergyEnergyEnergy ManagementManufacturingNuclearRenewable Energy
badNo H1BnoteSecurity Clearance RequirednoteU.S. Citizen Onlynote

Responsibilities

Provide technical leadership in the design, integration, and administration of large-scale Linux-based HPC clusters, high-speed networks, and storage systems
Lead medium to large technical projects, coordinating requirements, schedules, and deliverables across internal and external stakeholders
Architect and deploy advanced infrastructure solutions supporting exascale-class and mission-critical computing environments
Serve as a technical mentor for HPC engineers, guiding best practices in automation, performance tuning, and system security
Develop, implement, and maintain configuration management and automation frameworks (e.g., Ansible, Puppet, Salt) to enhance reliability and reproducibility
Perform advanced system performance analysis, troubleshooting, and optimization, ensuring system scalability and long-term sustainability
Manage critical vendor and partner relationships, representing ORNL’s technical requirements during procurement, integration, and system acceptance
Contribute to strategic planning and technology roadmaps, influencing unit goals and technical direction
Collaborate closely with scientists, researchers, and IT specialists to align infrastructure capabilities with research and security objectives
Ensure compliance with DOE cybersecurity standards, configuration baselines, and operational policies
Author technical documentation, present internal briefings, and communicate complex issues and resolutions to management and stakeholders
Participate in on-call rotations, maintenance windows, and incident response as needed to support 24x7 operations

Qualification

HPC systems engineeringLinux systems administrationAutomation frameworksBatch schedulersAdvanced scriptingHigh-speed interconnectsIdentity management systemsVirtualization solutionsTechnical documentationCommunication skillsMentoring

Required

Bachelor's degree in computer science, engineering, or a related technical field
A minimum of 8 years of relevant experience in Linux systems administration or HPC systems engineering

Preferred

Demonstrated experience leading the design and deployment of HPC or large-scale distributed computing systems
Expertise with batch schedulers (SLURM, PBS, LSF) and parallel file systems (Lustre, GPFS/Spectrum Scale)
Proven ability to lead technical projects from concept through implementation, balancing technical depth with project delivery
Strong proficiency in automation and infrastructure-as-code frameworks (Ansible, Puppet, Salt)
Advanced scripting or programming skills (Python, Bash, Go) for automation and operational tooling
In-depth understanding of high-speed interconnects (InfiniBand, Slingshot, Ethernet) and storage architectures
Experience managing identity and access management systems, including MFA, SSO, and zero-trust frameworks (PingFederate, RSA SecureID, Entra ID)
Experience integrating virtualization or containerization solutions (VMware, KVM, Apptainer, Podman) into HPC environments
Ability to manage client and stakeholder relationships across multiple directorates and technical disciplines
Excellent written and verbal communication skills, including the ability to present complex technical concepts to diverse audiences
Proven ability to influence technical strategy and mentor staff in a collaborative research environment

Benefits

Work on the world’s most powerful supercomputers, including Frontier , the first system to achieve exascale performance.
Enable breakthrough science in fields like fusion energy, climate modeling, AI, and national security.
Collaborate with diverse teams of scientists, engineers, and technologists from across the DOE complex and academia.
Grow your career in a mission-driven, innovation-focused environment with access to professional development and leadership opportunities.
Enjoy life in East Tennessee, with a thriving research community, scenic outdoor recreation, and a high quality of life.

Company

Oak Ridge National Laboratory

company-logo
Oak Ridge National Laboratory holds a range of R&D assignments, from fundamental nuclear physics to applied R&D on advanced energy systems.

Funding

Current Stage
Late Stage
Total Funding
$9.8M
Key Investors
US Department of Energy
2023-09-21Grant· $4.8M
2023-07-27Grant
2022-03-14Grant· $5M

Leadership Team

leader-logo
Arjun Shankar
Director, Compute and Data Environment for Science
linkedin
leader-logo
Brett Ellis
Group Lead - Business Systems Operations
linkedin
Company data provided by crunchbase