MIT Office of Resource Development · 1 month ago
Senior HPC Systems Engineer
The Massachusetts Green High Performance Computing Center (MGHPCC) is seeking a Senior HPC Systems Engineer to support the computing infrastructure behind the Massachusetts AI Hub’s AI Computing Resource (AICR). This hands-on role will be responsible for deploying, maintaining, and optimizing HPC clusters, storage systems, and networking for AI/ML workloads.
Fundraising
Responsibilities
Solid track record in HPC systems administration/engineering including hands-on experience with Linux-based cluster environments, job schedulers, and high-performance storage systems such as VAST, Lustre, or GPFS; strong skills in system monitoring, troubleshooting, automation, and performance optimization; and experience supporting end-users in technical research environments
Familiarity with GPU computing; scripting in Python or Bash; and container orchestration tools like Docker and Kubernetes; and experience in cloud-based HPC or AI/ML workloads
Qualification
Required
Solid track record in HPC systems administration/engineering including hands-on experience with Linux-based cluster environments
Experience with job schedulers
Experience with high-performance storage systems such as VAST, Lustre, or GPFS
Strong skills in system monitoring
Strong skills in troubleshooting
Strong skills in automation
Strong skills in performance optimization
Experience supporting end-users in technical research environments
Preferred
Familiarity with GPU computing
Scripting in Python or Bash
Container orchestration tools like Docker and Kubernetes
Experience in cloud-based HPC or AI/ML workloads