Apply on Employer Site

Inside Higher Ed · 1 week ago

HPC Sr. Scientific Software Engineer (IT@JH Research Computing)

Baltimore, MD

Full-time

Onsite

Mid, Senior Level

$100K/yr - $175K/yr

5+ years exp

Inside Higher Ed is seeking an HPC Sr. Scientific Software Engineer to support research computing initiatives. The role involves developing and optimizing scientific software deployment strategies on HPC and AI systems, collaborating with cross-functional teams, and mentoring junior engineers.

Digital MediaEducationHigher EducationJournalismRecruiting

Responsibilities

Develop and refine deployment strategies for scientific software on HPC and AI systems

Design computational workflows, selecting optimal software configurations, and utilizing tools like Ansible for automation

Assist teams in implementing, tuning, and optimizing AI models and gateway applications (e.g., XDMoD, Coldfront, Open OnDemand, CryoSPARC Live, SBGrid, AI Agents)

Analyze and optimize the performance of AI models and HPC applications, focusing on GPU-enabled computing

Implement parallel processing, distributed computing, and resource management techniques for efficient job execution

Develop, debug, and maintain software tools, libraries, and frameworks supporting HPC and AI workloads

Collaborate with the system team and software vendors (e.g., NVIDIA, Intel, Matlab) to optimize systems for maximum performance

Utilize CUDA, DNN, TensorRT, and Intel Compilers to enhance system performance

Manage and support scientific software deployment across HPC, cloud-based, and colocation facilities

Oversee installation, configuration, and maintenance of HPC packages with tools like CMake, Make, EasyBuild, Spack, and Lua module files

Work closely with cross-functional teams, including researchers, data scientists, and software developers, to address complex HPC/AI challenges

Mentor junior engineers and foster a culture of continuous learning

Resolve complex technical issues and perform root cause analysis for HPC/AI software challenges

Implement effective solutions to prevent recurrence and improve system reliability

Provide training workshops for researchers and students, focusing on troubleshooting, optimizing workflows, and effectively using HPC systems

Stay current with advances in HPC and AI technologies and methodologies

Incorporate new research findings into existing systems to improve performance and capabilities

Develop and manage container orchestration strategies to ensure scalability, reliability, and security of applications

Oversee the container lifecycle from creation and deployment to scaling and removal

Create comprehensive documentation for system designs, performance metrics, and project status

Ensure compliance with security and regulatory standards for all HPC and AI systems

Design, deploy, and maintain large-scale Linux HPC clusters with CPU/GPU resources, high-speed networks, and distributed storage

Develop and maintain automation frameworks for provisioning, monitoring, and software lifecycle management

Implement and optimize job scheduling, container orchestration, and workflow automation tools to support diverse research workloads

Collaborate with faculty and research teams to parallelize, containerize, and scale computational workflows for multi-GPU and distributed environments

Benchmark and tune application performance across architectures, documenting findings and sharing best practices

Integrate and support AI/ML frameworks, scientific libraries, and workflow engines (Snakemake, Nextflow, Dask, Ray)

Ensure system and application reliability through proactive monitoring (Prometheus, Grafana, ELK) and incident response participation

Support reproducibility and FAIR data principles through version-controlled, containerized environments

Contribute to documentation, training materials, and technical guidance to enhance user experience and self-service capabilities

Participate in evaluation and adoption of new technologies to advance performance, efficiency, and sustainability in research computing

Qualification

HPC user supportPerformance optimizationLinux systems administrationContainer orchestrationProgramming PythonProgramming C/C++Cluster managementCI/CD pipelinesGPU computingTechnical supportCollaborationMentorshipDocumentation

Required

PhD in a quantitative discipline

Five years of experience in HPC user support, software deployment, and performance optimization within an academic or research environment

Additional education may substitute for required experience and additional related experience may substitute for required education beyond a high school diploma/graduation equivalent, to the extent permitted by the JHU equivalency formula

Preferred

Eight + years of professional experience in high-performance computing, large-scale systems, or research software engineering

Deep proficiency in Linux systems administration, performance tuning, and automation tools (Ansible, Terraform, Jenkins, or similar)

Experience with cluster management, workload schedulers (e.g., Slurm), and distributed or parallel file systems (e.g., GPFS, Lustre, WekaFS, Ceph)

Strong background in programming or scripting (Python, Bash, C/C++, Go, or Rust)

Familiarity with containerization and orchestration technologies used in HPC (Singularity, Apptainer, Docker, Kubernetes)

Understanding of high-speed interconnects (InfiniBand, 100/400 Gb Ethernet) and storage/data access patterns for AI and analytics

Experience developing or maintaining CI/CD pipelines and module environments (Lmod/Spack) for research software

Knowledge of GPU computing (CUDA, ROCm), MPI/OpenMP, and AI/ML frameworks

Demonstrated ability to collaborate with researchers on performance optimization, workflow design, and reproducible computing

Company

Inside Higher Ed

Inside Higher Ed is the online source for news, opinion, and jobs related to higher education.

Founded in 2004

Washington, District of Columbia, USA

51-200 employees

https://www.insidehighered.com/

Funding

Current Stage

Growth Stage

Total Funding

unknown

2022-01-10Acquired

2006-08-31Series Unknown

Leadership Team

Stephanie Shweiki

Director, Foundation Partnerships

Recent News

PRLog

PebblePad Announces Global Partnership with Inside Higher Ed and Times Higher Education

2025-11-07

Research & Development World

NSF caps indirect costs at 15% for new university grants

2025-05-03

Business Standard India

OPT removal, visa scrutiny by US: Why Indian student enrolment fell 28%

2025-04-11

Company data provided by crunchbase