Andiamo · 1 day ago
Forward Deployed Engineers - Decentralized High-Performance Computing Leader
Andiamo is a globally recognized staffing and consulting firm specializing in placing top technology professionals. They are seeking a Senior Forward Deployed Engineer to design, deploy, and optimize large-scale GPU clusters for AI infrastructures, working closely with customers to enhance their AI workloads.
ConsultingHuman ResourcesInformation TechnologyStaffing Agency
Responsibilities
Design, deploy, and manage clusters exceeding 1,000 GPUs using custom-built automation playbooks and infrastructure-as-code tools
Diagnose and enhance the performance of compute, storage, and networking systems, collaborating closely with providers to deliver peak efficiency
Orchestrate large-scale data migrations across cloud and on-prem environments, handling petabytes of data with precision and speed
Troubleshoot complex issues across the stack—whether that’s debugging hardware anomalies or optimizing distributed data loaders across multi-region buckets
Develop robust internal tooling to streamline deployments, strengthen reliability, and empower automation where it truly makes an impact
Provide direct technical support during customer operations and participate in a rotating on-call schedule for critical environments
Qualification
Required
2+ years of experience in Software Engineering, Site Reliability Engineering, DevOps, Systems Administration, or High-Performance Computing
Proficiency in deploying and managing Kubernetes and/or SLURM clusters
Hands-on experience coding in Go, Python, and Bash
Strong familiarity with Ansible, Terraform, and other automation or Infrastructure-as-Code tools
Solid foundation in Computer Science, Engineering, or a related technical field
Exceptional verbal and written communication skills in English
Preferred
Building and operating AI workloads at 1,000+ GPU scale
Developing and maintaining large-scale, multi-tenant Kubernetes-based services
Deploying and managing datacenter hardware or bare-metal environments via MaaS, NetBox, or equivalent tools
Managing InfiniBand or RoCE network deployments supporting multi-tenant architectures
Designing and operating petabyte-scale all-flash or distributed storage systems (e.g., DDN, VAST, Weka, Ceph, or Lustre)
Company
Andiamo
The Talent Partners for the AI Revolution.
H1B Sponsorship
Andiamo has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2022 (2)
2021 (1)
Funding
Current Stage
Growth StageCompany data provided by crunchbase