SIGN IN
AI Cluster Architect jobs in United States
cer-icon
Apply on Employer Site
company-logo

Vultr · 22 hours ago

AI Cluster Architect

Vultr is a leading cloud infrastructure company focused on providing high-performance cloud solutions for enterprises and AI innovators. They are seeking an AI Cluster Architect responsible for designing large-scale GPU cluster architectures while adhering to stringent power and infrastructure limits, optimizing for GPU density and service requirements.
Artificial Intelligence (AI)Cloud ComputingCloud InfrastructureCloud StorageWeb Hosting
check
Work & Life Balance
check
H1B Sponsor Likelynote

Responsibilities

Architect large-scale GPU clusters within fixed site power budgets that optimizes for maximum GPU density while reserving necessary headroom for compute services, storage, and networking
Model and validate power consumption across the full cluster bill of materials (GPUs, CPUs, NICs, switches, fabric components, storage, and facility limits)
Evaluate tradeoffs across multiple fabric networking architectures (InfiniBand, RoCE, SpectrumX) as well as multi-plane, 2-tier/3-tier, and rail-optimized topologies
Determine network scale limits based on switch radix, link speed, topology, and blocking requirements
Gather, interpret, and maintain detailed SKU-level power and thermal specifications for GPUs, NICs, switches, DPUs, storage, and server platforms
Develop power-aware cluster configuration templates and capacity-planning models that can scale across sites with varying constraints and allow for quick iteration and ideation
Document architecture, design choices, tradeoff analyses, and operational considerations for deployment and lifecycle management
Provide guidance on future-proofing, including the ability to incorporate next-gen GPUs, NICs, or fabrics
Collaborate with vendors on novel fabric architectures that enable large-scale cluster deployments (100k+ GPUs)

Qualification

HPC cluster designGPU architectureNetworking technologiesPower modelingThermal characteristicsDocumentation skillsCross-functional collaborationCommunication skills

Required

7+ years designing or building large-scale HPC, AI, or hyperscale GPU clusters
Expert understanding of GPU and accelerator system design, including node topology, PCIe/NVLink/NVSwitch/ROCm, and NIC-to-GPU affinity considerations
Strong familiarity with InfiniBand, RoCE, and SpectrumX networking, including multi-tier, multi-plane, Clos/dragonfly variants, and large-radix switch design
Demonstrated experience modeling power draw and thermal characteristics of servers, GPUs, NICs, switches, optics, and storage systems
Ability to design networks that maintain full non-blocking performance or intentionally introduce over/under-subscription while understanding impacts on workload performance
Proven ability to gather and analyze vendor SKU-level specifications and incorporate them into scalable cluster architectures
Experience balancing customer-driven requirements for compute, storage, and service density in combination with overall GPU count
Strong documentation, communication, and cross-functional collaboration skills

Benefits

Excellent Medical Benefits w/ 100% company-paid premiums for employee only plan + 100% company-paid dental & vision premiums
401(k) plan that matches 100% up to 4% with immediate vesting
Professional Development Reimbursement of $2,500 each year
11 Holidays + Paid Time Off Accrual + Rollover Plan + take your birthday off
Increased PTO at 3 year & 10 year anniversary + 1 month paid sabbatical every 5 years + Anniversary Bonus each year
$500 first year remote office setup + $400 each following year for new equipment
Internet reimbursement up to $75 per month
Gym membership reimbursement up to $50 per month
Company-paid Wellable subscription

Company

Vultr

twittertwittertwitter
company-logo
Vultr is an AI cloud infrastructure platform offering latest generation NVIDIA GPUs and AMD CPUs and GPUs across 32 worldwide regions

H1B Sponsorship

Vultr has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2024 (1)

Funding

Current Stage
Growth Stage
Total Funding
$662M
Key Investors
Bank of America,JP Morgan Chase,Wells FargoAMD Ventures,LuminArx Capital Management LP
2025-06-23Debt Financing· $329M
2024-12-18Private Equity· $333M
2014-02-20Angel

Leadership Team

leader-logo
Mike Marinescu
Chief Technology Officer
linkedin
Company data provided by crunchbase