Lambda · 2 months ago
AI Infrastructure Deployment Lead
Lambda is a company focused on building Gigawatt-scale AI Factories for Training and Inference. The AI Infrastructure Deployment Lead will be responsible for planning and executing the deployment of large-scale AI infrastructure, leading technical teams to design network topologies and ensure the delivery of optimized compute environments.
AI InfrastructureArtificial Intelligence (AI)Cloud ComputingData CenterGPUMachine Learning
Responsibilities
Lead end-to-end deployment of GPU clusters, storage systems, and networking fabric across Lambda’s data centers
Design and implement data center network topologies optimized for AI and HPC workloads, including high-speed Ethernet and InfiniBand environments
Oversee rack implementation, cabling, and power/cooling validation for optimal efficiency and scalability
Collaborate with supply chain, logistics, and operations teams to ensure smooth delivery and installation timelines
Implement Layer 2/Layer 3 networks, including VLANs, Spine to Leaf architecture, Infiniband interconnect technology
Partner with network architects to ensure redundancy, scalability, and low-latency interconnects for distributed AI workloads
Monitor network health, identify bottlenecks, and implement optimizations to maintain peak performance
Oversee server hardware troubleshooting, including GPUs, NICs, CPUs, and storage components
Lead root-cause analysis for system issues and drive corrective actions in collaboration with vendors and internal hardware teams
Develop standard operating procedures (SOPs) for hardware validation, deployment, and maintenance
Serve as technical project lead for infrastructure rollouts and cluster expansion projects
Coordinate cross-functional teams — networking, facilities, cloud operations, and hardware engineering — to execute deployments on schedule
Manage project scope, budgets, risk assessments, and post-deployment reviews
Communicate status, challenges, and milestones to leadership with clarity and precision
Maintain detailed network topology diagrams, deployment runbooks, and hardware inventories
Identify opportunities for process automation and infrastructure standardization across deployments
Contribute to Lambda’s internal knowledge base and mentor junior engineers on data center best practices
Qualification
Required
Bachelor's degree in Computer Engineering, Information Technology, or related field
CCNA (Cisco Certified Network Associate) certification (CCNP or equivalent a plus)
PMP (project Management Professional) Certification (PMP or equivalent a plus)
5+ years of experience in data center infrastructure deployment or network operations, preferably in AI, HPC, or cloud environments
Proven ability to lead complex technical projects and manage multidisciplinary teams
Strong understanding of data center network design (Layer 2/3, VLAN, Rack elevations, port mapping, Infiniband technologies
Hands-on expertise in server hardware troubleshooting and rack-level integration
Ability and willingness to travel 50-70% to our data center sites
Preferred
Experience deploying or managing GPU clusters and distributed training environments
Familiarity with automation and orchestration tools (Ansible, Terraform) and monitoring systems (Prometheus, Grafana)
Knowledge of structured cabling, power distribution, and environmental monitoring in data centers
Excellent communication and documentation skills
Benefits
Health, dental, and vision coverage for you and your dependents
Wellness and Commuter stipends for select roles
401k Plan with 2% company match (USA employees)
Flexible Paid Time Off Plan that we all actually use
Company
Lambda
Lambda is a cloud-based platform that provides high-performance GPU hardware and cloud infrastructure for AI model training and inference.
H1B Sponsorship
Lambda has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (16)
2024 (1)
2023 (3)
2022 (2)
2021 (2)
2020 (3)
Funding
Current Stage
Late StageTotal Funding
$3.19BKey Investors
TWG GlobalJP MorganMacquarie Group
2025-11-18Series E· $1.5B
2025-08-19Debt Financing· $275M
2025-02-19Series D· $480M
Recent News
2026-01-11
2026-01-09
Company data provided by crunchbase