Hewlett Packard Enterprise · 3 months ago
Systems Analyst (/Site Reliability Engineer)
Hewlett Packard Enterprise is a global edge-to-cloud company that helps organizations manage their data and applications. They are seeking a skilled Systems Analyst (/Site Reliability Engineer) to support Oak Ridge National Laboratory, focusing on the deployment, maintenance, and optimization of high-performance computing systems for scientific research.
Data CenterEnterprise SoftwareInformation TechnologyIT ManagementNetwork Security
Responsibilities
Maintain and optimize compute infrastructure across multiple large-scale HPC systems
Participate in the deployment, testing, and validation of live high-performance computing clusters
Troubleshoot node failures by analyzing OS internals, compiler behavior, and system logs, coordinating with internal subject-matter experts as needed
Conduct routine and on-demand maintenance, troubleshooting, and performance tuning for large-scale HPC environments
Collaborate with researchers, engineers, and technical staff to open, maintain and close JIRA tickets to ensure system reliability and efficiency for high-stakes, high-performance scientific research
Investigate and document complex software and system-level issues, acting as a bridge between users and HPE internal teams
Develop and implement automation tools, scripts, and monitoring solutions to streamline system management
Stay up-to-date with advancements in HPC technologies, including GPU acceleration (e.g., ROCm), parallel computation (Cray PE, MPI/OpenMP), and performance tuning
Qualification
Required
Due to the nature of the work, this position requires either U.S. Citizenship or U.S. Lawful Permanent Resident (LPR) status
Experience using SLURM-based HPC systems, both as a user and preferably as a system administrator
Proficient in Linux, Python, and Bash scripting
Familiarity with C++/Fortran-based HPC application development, GPUs, MPI, and high-performance computing tools
Strong understanding of application build processes, including compiler configurations, library integration, and dependency management, to effectively set up environments, perform upgrades, and troubleshoot build and runtime issues
Experience in large-scale log analysis and troubleshooting performance, bugs or system failures
Strong written and verbal communication skills, with the ability to document and share knowledge effectively with internal teams and end-users
Familiarity with emerging HPC trends, system architectures, and optimization strategies
Bachelor's in Computer Science, Computer Engineering, or a related field, with at least 2 years of experience, OR a Master's in Computer Science or Computer Engineering of a related field
Benefits
Health & Wellbeing
Personal & Professional Development
Unconditional Inclusion
Company
Hewlett Packard Enterprise
Hewlett Packard Enterprise is an edge-to-cloud company that uses comprehensive solutions to accelerate business outcomes.
Funding
Current Stage
Public CompanyTotal Funding
$2.85BKey Investors
Elliott Management Corp.
2025-04-15Post Ipo Equity· $1.5B
2024-09-10Post Ipo Equity· $1.35B
2015-11-02IPO
Leadership Team
Recent News
2026-01-09
The Register
2026-01-06
Company data provided by crunchbase