AI/ML Solutions Architect jobs in United States
cer-icon
Apply on Employer Site
company-logo

Doghouse Recruitment ยท 13 hours ago

AI/ML Solutions Architect

Doghouse Recruitment is seeking an AI/ML Solutions Architect to join their fast-moving AI infrastructure team focused on large-scale ML workloads. The role involves designing and validating production-grade distributed training architectures and collaborating closely with clients to optimize ML workloads across multi-node GPU environments.

Staffing & Recruiting
Hiring Manager
Chris Sterk
linkedin

Responsibilities

Design and validate production-grade distributed training (primary) and large-scale inference architectures on large GPU clusters, typically tens to thousands of GPUs
Work hands-on with customers to debug, optimize, and scale ML workloads across multi-node GPU environments
Act as a technical authority on GPU performance, networking, and schedulers, making trade-offs at scale and translating customer needs into concrete platform requirements
Collaborate closely with engineering, product, and R&D to influence roadmap decisions based on real-world ML workloads
This is a hands-on, technical role; you are expected to work directly in customer environments, not only advise at a high level

Qualification

Multi-node GPU workloadsDistributed deep learningGPU architectureKubernetesSlurm

Required

Hands-on experience designing and operating production-grade, multi-node GPU workloads for training or inference
Strong background in distributed deep learning (PyTorch Distributed, DeepSpeed) on GPU clusters
Deep understanding of GPU architecture and interconnects (H100/A100 class, NVLink, InfiniBand)
Experience with Kubernetes or Slurm and performance tuning using GPU profiling and monitoring tools

Company

Doghouse Recruitment

twitter
company-logo
Recruitment for your technology teams. You don't need another agency flooding your inbox with mismatched candidates.

Funding

Current Stage
Early Stage
Company data provided by crunchbase