Hydra Host · 2 months ago
Storage Engineer at Hydra Host
Hydra Host is a Founders Fund-backed NVIDIA cloud partner building infrastructure for AI at scale. They are seeking a Storage Engineer to lead the architecture, development, and deployment of their next-generation AI/HPC storage platform, focusing on designing and building a production-grade storage system to support bare-metal GPU clusters.
Artificial Intelligence (AI)Cloud InfrastructureDeveloper APIsWeb Hosting
Responsibilities
Define, architect, and implement Hydra Host’s first production storage platform tailored for bare-metal GPU clusters and AI/HPC workloads
Lead all technical decisions around storage stack design, from hardware infrastructure to parallel file system orchestration and performance tuning
Select, build, and maintain storage solutions spanning both block (NVMe, SAN, Ceph, etc.) and object storage (S3-compatible, custom, or Ceph Object Gateway) layers
Design for high-throughput, low-latency access, supporting large datasets, rapid checkpointing, and parallel access for distributed AI training workloads
Integrate and optimize parallel file systems such as Lustre, BeeGFS, Spectrum Scale, WekaIO, or CephFS, ensuring maximum performance and fault tolerance
Ensure compatibility across Hydra’s diverse GPU/OEM ecosystem, accounting for unique firmware, BMC/Redfish APIs, and hardware configurations
Develop automation, observability, and management tooling for storage, focusing on reliability, scalability, and efficiency
Act as a builder and architect: deeply hands-on in deployment, troubleshooting, and optimization, while guiding long-term storage roadmap
Collaborate cross-functionally with GPU, HPC, and platform engineering teams to integrate storage with compute and network layers
Interface with customers and product leadership to define feature priorities, performance benchmarks, and future enhancements
Qualification
Required
8+ years of progressive, hands-on experience designing and implementing high-performance storage systems for compute clusters in HPC, AI, or bare-metal cloud environments
Proven track record building storage infrastructure from scratch, not just operating existing systems
Deep expertise in block storage (NVMe, SAN, Ceph, distributed block systems) and object storage (S3, MinIO, Ceph Object Gateway, etc.)
Strong background in parallel file systems (WekaIO, BeeGFS, Lustre, Spectrum Scale, or similar) supporting GPU or AI cluster workloads
Solid foundation in Linux systems engineering, automation, and scripting for distributed environments
Familiarity with BMC, Redfish APIs, and OEM server firmware for bare-metal management
Deep understanding of AI/ML data pipelines: model checkpointing, data locality, and multi-tiered storage optimization
Excellent problem-solving, debugging, and communication skills, able to translate technical decisions into clear architectural direction
Preferred
Experience building storage solutions for large-scale GPU or HPC infrastructure
History of technical leadership or mentorship, growing teams or owning a product roadmap
Experience evaluating and managing vendor relationships and negotiating storage hardware/software contracts
Contributions to open-source HPC or storage projects (Ceph, Lustre, BeeGFS, etc.)
Familiarity with confidential computing, secure data handling, or high-availability architectures
Company
Hydra Host
Hydra offers a bare metal GPU platform, connecting businesses to a vareity of independent but standardized AI Factory Franchises.
Funding
Current Stage
Early StageTotal Funding
$10MKey Investors
Flume VenturesFounders Fund
2025-02-10Seed
2024-09-12Seed
2022-04-06Seed· $10M
Recent News
2025-10-23
NVIDIA CORPORATION
2025-06-11
Company data provided by crunchbase