AI Infrastructure Deployment Lead jobs in United States
cer-icon
Apply on Employer Site
company-logo

Lambda · 2 months ago

AI Infrastructure Deployment Lead

Lambda is a company focused on building Gigawatt-scale AI Factories for Training and Inference. The AI Infrastructure Deployment Lead will be responsible for planning and executing the deployment of large-scale AI infrastructure, leading technical teams to design network topologies and ensure the delivery of optimized compute environments.

AI InfrastructureArtificial Intelligence (AI)Cloud ComputingData CenterGPUMachine Learning
check
Comp. & Benefits
check
H1B Sponsor Likelynote

Responsibilities

Lead end-to-end deployment of GPU clusters, storage systems, and networking fabric across Lambda’s data centers
Design and implement data center network topologies optimized for AI and HPC workloads, including high-speed Ethernet and InfiniBand environments
Oversee rack implementation, cabling, and power/cooling validation for optimal efficiency and scalability
Collaborate with supply chain, logistics, and operations teams to ensure smooth delivery and installation timelines
Implement Layer 2/Layer 3 networks, including VLANs, Spine to Leaf architecture, Infiniband interconnect technology
Partner with network architects to ensure redundancy, scalability, and low-latency interconnects for distributed AI workloads
Monitor network health, identify bottlenecks, and implement optimizations to maintain peak performance
Oversee server hardware troubleshooting, including GPUs, NICs, CPUs, and storage components
Lead root-cause analysis for system issues and drive corrective actions in collaboration with vendors and internal hardware teams
Develop standard operating procedures (SOPs) for hardware validation, deployment, and maintenance
Serve as technical project lead for infrastructure rollouts and cluster expansion projects
Coordinate cross-functional teams — networking, facilities, cloud operations, and hardware engineering — to execute deployments on schedule
Manage project scope, budgets, risk assessments, and post-deployment reviews
Communicate status, challenges, and milestones to leadership with clarity and precision
Maintain detailed network topology diagrams, deployment runbooks, and hardware inventories
Identify opportunities for process automation and infrastructure standardization across deployments
Contribute to Lambda’s internal knowledge base and mentor junior engineers on data center best practices

Qualification

Data center network designGPU cluster deploymentServer hardware troubleshootingTechnical project leadershipCCNA certificationPMP certificationAutomation toolsMonitoring systemsCommunicationDocumentation skills

Required

Bachelor's degree in Computer Engineering, Information Technology, or related field
CCNA (Cisco Certified Network Associate) certification (CCNP or equivalent a plus)
PMP (project Management Professional) Certification (PMP or equivalent a plus)
5+ years of experience in data center infrastructure deployment or network operations, preferably in AI, HPC, or cloud environments
Proven ability to lead complex technical projects and manage multidisciplinary teams
Strong understanding of data center network design (Layer 2/3, VLAN, Rack elevations, port mapping, Infiniband technologies
Hands-on expertise in server hardware troubleshooting and rack-level integration
Ability and willingness to travel 50-70% to our data center sites

Preferred

Experience deploying or managing GPU clusters and distributed training environments
Familiarity with automation and orchestration tools (Ansible, Terraform) and monitoring systems (Prometheus, Grafana)
Knowledge of structured cabling, power distribution, and environmental monitoring in data centers
Excellent communication and documentation skills

Benefits

Health, dental, and vision coverage for you and your dependents
Wellness and Commuter stipends for select roles
401k Plan with 2% company match (USA employees)
Flexible Paid Time Off Plan that we all actually use

Company

Lambda

twittertwittertwitter
company-logo
Lambda is a cloud-based platform that provides high-performance GPU hardware and cloud infrastructure for AI model training and inference.

H1B Sponsorship

Lambda has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (16)
2024 (1)
2023 (3)
2022 (2)
2021 (2)
2020 (3)

Funding

Current Stage
Late Stage
Total Funding
$3.19B
Key Investors
TWG GlobalJP MorganMacquarie Group
2025-11-18Series E· $1.5B
2025-08-19Debt Financing· $275M
2025-02-19Series D· $480M

Leadership Team

leader-logo
Stephen Balaban
Co-founder, CEO
linkedin
leader-logo
Michael Balaban
Co-Founder / CTO
linkedin
Company data provided by crunchbase