AI Solutions Engineer at Hydra Host jobs in United States
cer-icon
Apply on Employer Site
company-logo

Hydra Host · 2 weeks ago

AI Solutions Engineer at Hydra Host

Hydra Host is a Founders Fund-backed NVIDIA cloud partner building infrastructure for AI at scale. The AI Solutions Engineer will ensure an exceptional technical experience for AI Platform and Enterprise customers, working on proof-of-concept AI platforms and collaborating with various teams to optimize performance and enhance customer enablement.

Artificial Intelligence (AI)Cloud InfrastructureDeveloper APIsWeb Hosting

Responsibilities

Prototype and operate proof-of-concept AI platforms and neo-clouds on top of Hydra using the Brokkr API — to validate the developer experience
Build and maintain an open-source “neo-cloud in a box” reference implementation that demonstrates multi-tenancy, spin servers up & down based demand, expose containerized or virtualized GPU access
Dogfood Hydra’s API's and infrastructure, and tooling to continuously, find gaps, sharp edges, and failure modes before customers do, and working with product and engineering to resolve them
Work closely with the API and monetization teams by incorporating direct customer feedback into feature prioritization, pricing models, and API design
Run and validate the latest AI platforms, inference stacks, and orchestration frameworks on Hydra to ensure first-class support
Collaborate closely with product and engineering to turn learnings into productized workflows, defaults, automations
Create targeted provisioning templates (e.g., self-managed Kubernetes, specialized inference engines, custom OS images) by researching common software stacks, licenses, and dependencies used by AI platforms
Provide developers with high-quality technical enablement: code samples, SDK contributions, reference implementations, and clear documentation
Act as a technical voice for Hydra’s developer ecosystem: host webinars, write technical content, run demos, participate in events, and support hackathons showcasing what’s possible on Hydra
Document best practices and standardize configurations to scale customer success globally

Qualification

NVIDIA GPU StackBare Metal LinuxAI WorkloadsWorkload OrchestrationScriptingNetworkingMonitoringContainer RuntimesCloud ProvisioningObservabilityHPC ClustersTEEStorage SystemsBMC ProvisioningCustomer ObsessionPrincipled ThinkingTechnical CuriositySystems Thinking

Required

NVIDIA GPU Stack — Deep knowledge of NVIDIA hardware (drivers, firmware, NVLink, NCCL, CUDA, libraries), and how stack compatibility impacts performance
Bare Metal Linux — Strong experience in bare-metal Linux systems administration, driver stacks, and kernel options to use
AI Workloads — Proficiency running many various Hugging Face, PyTorch, model deployment frameworks, vLLM, and large-scale inference/training
AI Benchmarking - Hands-on experience benchmarking AI workloads like Megatron, etc
Workload Orchestration — Experience running Kubernetes clusters (CAPI), Slurm, and Ansible tools for cluster automation and workload management
Scripting - Solid scripting skills (e.g., shell scripts, Perl, Ruby, Python)
Networking — OSI Layer 2/Layer 3 fundamentals (TCP/IP, DNS), VLANs, Bonding
East / West — RoCE or Infiniband familiarity
Observability and Monitoring - nvidia-smi profiling, Prometheus/ Grafana or ELK stack
Container Runtimes - Containers like Docker, Podman, Singularity
Cloud Provisioning - Terraform, Cloud-init, etc

Preferred

HPC Clusters - Experience in HPC or large distributed training environments
TEE - Familiarity with Trusted Execution Environments, Intel TDX, or Confidential Compute
Storage Systems - familiarity with local and distributed file systems: NVMe, NFS, RAID, distributed file systems, CEPH, WEKA, VAST, DDN storage, etc
BMC Provisioning - MaaS, iPXE, IPMI

Benefits

Equity ownership
Competitive salary
Healthcare coverage
Fully remote team
Direct impact

Company

Hydra Host

twittertwittertwitter
company-logo
Hydra offers a bare metal GPU platform, connecting businesses to a vareity of independent but standardized AI Factory Franchises.

Funding

Current Stage
Early Stage
Total Funding
$10M
Key Investors
Flume VenturesFounders Fund
2025-02-10Seed
2024-09-12Seed
2022-04-06Seed· $10M

Leadership Team

leader-logo
Aaron Ginn
CEO & Co-Founder
linkedin
leader-logo
Garrett Johnson
Co-Founder & COO
linkedin
Company data provided by crunchbase