Northrop Grumman · 2 months ago
Principal/Sr Principal HPC Systems Engineer
Northrop Grumman is a trusted provider of mission-enabling solutions for global security, seeking candidates for the position of High-Performance Computing (HPC) Principal or Sr. Principal HPC Systems Engineer. The role involves overseeing the design and operation of high-performance compute clusters, leading a team of systems administrators, and ensuring system performance aligns with customer requirements.
AerospaceData IntegrationManufacturingRemote SensingSecurity
Responsibilities
Oversee design, deployment, and lifecycle operation of a high-performance compute cluster
Lead team of HPC Systems Administrators
Assess and respond to customer requests for cluster modifications, including oversight of requirement gathering and analysis, planning, implementation, verification/validation, and production deployment maintenance
Investigate, diagnose, and resolve acute system faults
Ensure system performance aligns with customer requirements and remain within technical, schedule, and cost constraints
Maintain software deployments
Maintain security compliance
Monitor and maintain hardware
Contribute to design of new high-performance compute clusters
Interface with user support staff
Assess new technology for benefits and risks by performing trade studies of technological function, value proposition, and deployment timeline
Assess and report on cluster operational risks and propose, plan, and deploy mitigation strategies
Qualification
Required
A degree in a STEM area (Science, Technology, Engineering or Math) with a minimum of 5 years of experience with a bachelor's degree, 3 years of experience with a master's degree, or 0 years of experience with a PhD
Demonstrated experience maintaining computational hardware through its lifecycle
Demonstrated experience analyzing and responding to customer requirements
Strong Linux systems administration proficiency (RHEL nice to have)
Strong knowledge and experience with concepts of high-performance computing system operations, including cluster management (Ansible), multi-user login environments, job scheduling (SLURM), and networked file systems
Strong knowledge and experience maintaining compliance with Security Technical Implementation Guides (STIGs)
Strong knowledge and experience with compiling software
Strong knowledge and experience monitoring and maintaining high-performance compute cluster hardware
Experience directing technical work of a small team of Linux Systems Administrators
Strong written and verbal communication skills
Candidate Must be a U.S. Citizen
Active US Government security clearance per customers requirements
Preferred
IAT Level II certification
Experience with MPI-based implementations
Experience with high-speed, low-latency network fabrics
Experience with parallel file systems
Experience with GPUs
Benefits
Health insurance coverage
Life and disability insurance
Savings plan
Company paid holidays
Paid time off (PTO) for vacation and/or personal business
Company
Northrop Grumman
Northrop Grumman is an aerospace, defense and security company that provides training and satellite ground network communications software.
Funding
Current Stage
Public CompanyTotal Funding
$3.7BKey Investors
U.S. Department of DefenseNASA
2025-05-27Post Ipo Debt· $1B
2024-01-29Post Ipo Debt· $2.5B
2023-12-20Grant· $72M
Leadership Team
Recent News
2026-01-09
2026-01-09
Company data provided by crunchbase