Apply on Employer Site

Skew Talent · 18 hours ago

Head of AI Infrastructure Engineering

United States

Full-time

Remote

Director/Executive

Skew Talent is seeking a Head of AI Infrastructure Engineering to oversee the development of a large-scale AI infrastructure deployment in India. The role involves designing GPU cluster architectures, managing vendor relationships, and leading a team of engineers to ensure optimal performance and reliability of the infrastructure.

Staffing & Recruiting

Hiring Manager

Kathleen Walsh

Responsibilities

Design GPU cluster architectures for training and inference at scale (thousands of GPUs, not dozens)

Specify hardware configurations: GPU servers, networking fabric, storage systems, power and cooling

Evaluate and select vendors; negotiate technical specifications with OEMs like Dell, Supermicro, HPE, and NVIDIA directly

Work with facility teams on power infrastructure, electrical distribution, and cooling solutions for high-density AI deployments

Build automation for cluster provisioning, configuration management, and lifecycle operations

Implement job scheduling and workload management (Slurm, Kubernetes, custom orchestration as needed)

Establish monitoring, alerting, and observability for infrastructure health at scale

Lead calls with overseas teams to review progress, present architectures, and provide technical guidance

Define operational runbooks, incident response, and SRE practices

Build and lead a team of infrastructure engineers, systems administrators, and hardware specialists

Travel to India periodically to work directly with data center and operations teams

Qualification

GPU infrastructureHigh-performance networkingLinux systems engineeringStorage systems for MLData center operationsVendor negotiationAutomation for provisioningMonitoringObservabilityCommunicatorComfortable with ambiguityDecision makingTeam leadership

Required

You've built GPU infrastructure at scale; you know NVIDIA's ecosystem (DGX, HGX, NVLink, NVSwitch, CUDA, NCCL) from hands-on experience, not just vendor briefings

Deep expertise in high-performance networking: InfiniBand, 400G Ethernet, RDMA, GPUDirect; you understand why network topology matters for distributed training

Strong Linux systems engineering background; you've managed thousands of nodes and know what breaks at scale

Experience with storage systems for ML workloads: Lustre, GPFS, BeeGFS, NVMe-oF, parallel file systems

You've worked at a hyperscaler (AWS, GCP, Azure) or AI-native infrastructure provider (CoreWeave, Lambda, Crusoe, or similar); you know what good looks like

Comfortable with data center operations: power, cooling, rack density, PUE optimization; you can have a real conversation with facilities engineers

You can make decisions with incomplete information and defend them technically; you don't wait for perfect specs before moving forward

Able to hold a high bar and push teams toward excellence without being a know-it-all

Strong communicator who can translate between hardware vendors, operations teams, and business stakeholders across time zones

Hungry to build something from the ground up; you're not looking for a role where you inherit someone else's architecture

Comfortable with ambiguity and an ability to take confident action when there are missing details

Preferred

Experience with advanced cooling: liquid cooling, two-phase cooling, immersion systems

Background in greenfield data center buildouts, not just operating existing infrastructure

Familiarity with India-specific considerations: power procurement, regulatory requirements, vendor landscape

Prior work with AI/ML frameworks and MLOps; you understand what the workloads actually look like

Benefits

Medical and Dental benefits

401K

Office space in Seattle with remote flexibility; we value quality candidates over location

Direct reporting to leadership with minimal bureaucracy

Ground-floor opportunity to build infrastructure at unprecedented scale

Small, sharp team culture that uses AI extensively in our own work

Company

Skew Talent

Honest, straightforward recruiting & talent strategy.

New York, NY, US

0-1 employees

https://www.skewtalent.com/

Funding

Current Stage

Early Stage

Company data provided by crunchbase