HPC Infrastructure Solution Architect jobs in United States
cer-icon
Apply on Employer Site
company-logo

Doghouse Recruitment · 7 hours ago

HPC Infrastructure Solution Architect

Doghouse Recruitment is seeking an HPC Infrastructure Solutions Architect to join their AI infrastructure team, focusing on building GPU, networking, and storage platforms for large-scale AI training workloads. The role involves designing and operating production-grade GPU and HPC platforms, ensuring quality, scalability, and efficiency of the infrastructure.

Staffing & Recruiting
Hiring Manager
Chris Sterk
linkedin

Responsibilities

Design and operate production-grade GPU and HPC platforms for AI/ML training and simulation
Build and scale GPU clusters, with a strong focus on Slurm-based scheduling
Design and optimize high-performance networking using RDMA, InfiniBand, NVLink, and NVSwitch
Design and tune storage and I/O paths for large-scale datasets
Build cloud infrastructure using open-source tooling such as Kubernetes, Terraform, and Helm

Qualification

GPU cluster managementHPC infrastructure designLinux expertiseKubernetesNetworking expertiseStorage optimizationCloud infrastructureSlurm schedulerRDMA knowledgeMulti-cloud experience

Required

Hands-on experience building and operating GPU or HPC clusters
Strong Linux, Kubernetes, networking, and storage background
Deep understanding of HPC networking and RDMA stacks
Experience with GPU schedulers, preferably Slurm
Strong cloud experience, ideally multi-cloud
Strong storage and I/O expertise

Preferred

Experience with specific storage technologies

Company

Doghouse Recruitment

twitter
company-logo
Recruitment for your technology teams. You don't need another agency flooding your inbox with mismatched candidates.

Funding

Current Stage
Early Stage
Company data provided by crunchbase