Hydra Host · 2 weeks ago
AI Solutions Engineer at Hydra Host
Hydra Host is a Founders Fund-backed NVIDIA cloud partner building infrastructure for AI at scale. The AI Solutions Engineer will ensure an exceptional technical experience for AI Platform and Enterprise customers, working on proof-of-concept AI platforms and collaborating with various teams to optimize performance and enhance customer enablement.
Artificial Intelligence (AI)Cloud InfrastructureDeveloper APIsWeb Hosting
Responsibilities
Prototype and operate proof-of-concept AI platforms and neo-clouds on top of Hydra using the Brokkr API — to validate the developer experience
Build and maintain an open-source “neo-cloud in a box” reference implementation that demonstrates multi-tenancy, spin servers up & down based demand, expose containerized or virtualized GPU access
Dogfood Hydra’s API's and infrastructure, and tooling to continuously, find gaps, sharp edges, and failure modes before customers do, and working with product and engineering to resolve them
Work closely with the API and monetization teams by incorporating direct customer feedback into feature prioritization, pricing models, and API design
Run and validate the latest AI platforms, inference stacks, and orchestration frameworks on Hydra to ensure first-class support
Collaborate closely with product and engineering to turn learnings into productized workflows, defaults, automations
Create targeted provisioning templates (e.g., self-managed Kubernetes, specialized inference engines, custom OS images) by researching common software stacks, licenses, and dependencies used by AI platforms
Provide developers with high-quality technical enablement: code samples, SDK contributions, reference implementations, and clear documentation
Act as a technical voice for Hydra’s developer ecosystem: host webinars, write technical content, run demos, participate in events, and support hackathons showcasing what’s possible on Hydra
Document best practices and standardize configurations to scale customer success globally
Qualification
Required
NVIDIA GPU Stack — Deep knowledge of NVIDIA hardware (drivers, firmware, NVLink, NCCL, CUDA, libraries), and how stack compatibility impacts performance
Bare Metal Linux — Strong experience in bare-metal Linux systems administration, driver stacks, and kernel options to use
AI Workloads — Proficiency running many various Hugging Face, PyTorch, model deployment frameworks, vLLM, and large-scale inference/training
AI Benchmarking - Hands-on experience benchmarking AI workloads like Megatron, etc
Workload Orchestration — Experience running Kubernetes clusters (CAPI), Slurm, and Ansible tools for cluster automation and workload management
Scripting - Solid scripting skills (e.g., shell scripts, Perl, Ruby, Python)
Networking — OSI Layer 2/Layer 3 fundamentals (TCP/IP, DNS), VLANs, Bonding
East / West — RoCE or Infiniband familiarity
Observability and Monitoring - nvidia-smi profiling, Prometheus/ Grafana or ELK stack
Container Runtimes - Containers like Docker, Podman, Singularity
Cloud Provisioning - Terraform, Cloud-init, etc
Preferred
HPC Clusters - Experience in HPC or large distributed training environments
TEE - Familiarity with Trusted Execution Environments, Intel TDX, or Confidential Compute
Storage Systems - familiarity with local and distributed file systems: NVMe, NFS, RAID, distributed file systems, CEPH, WEKA, VAST, DDN storage, etc
BMC Provisioning - MaaS, iPXE, IPMI
Benefits
Equity ownership
Competitive salary
Healthcare coverage
Fully remote team
Direct impact
Company
Hydra Host
Hydra offers a bare metal GPU platform, connecting businesses to a vareity of independent but standardized AI Factory Franchises.
Funding
Current Stage
Early StageTotal Funding
$10MKey Investors
Flume VenturesFounders Fund
2025-02-10Seed
2024-09-12Seed
2022-04-06Seed· $10M
Recent News
2025-10-23
NVIDIA CORPORATION
2025-06-11
Company data provided by crunchbase