Lambda · 5 months ago
Storage Engineering Manager
Lambda is the #1 GPU Cloud for ML/AI teams, providing a platform for building, testing, and deploying AI products. The Storage Engineering Manager will lead a team to develop and manage high-performance storage solutions for AI/ML infrastructure, ensuring operational excellence and scalability.
AI InfrastructureArtificial Intelligence (AI)Cloud ComputingData CenterGPUMachine Learning
Responsibilities
Grow/Hire, lead, and mentor a top-talent team of high-performing storage engineers delivering HPC, petabyte-scale storage solutions
Foster a high-velocity culture of innovation, technical excellence, and collaboration
Conduct regular one-on-one meetings, provide constructive feedback, and support career development for team members
Drive outcomes by managing project priorities, deadlines, and deliverables using Agile methodologies
Drive the technical vision and strategy for Lambda distributed storage solutions
Lead storage vendor selection criteria, vendor selection, and vendor relationship management (support, installation, scheduling, specification, procurement)
Manage team in storage lifecycle management (installation, cabling, capacity upgrades, service, RMA, updating both hardware and software components as needed)
Guide choices around optimization of storage pools, sharding, and tiering/caching strategies
Lead team in tasks related to multi-tenant security, tenant provisioning, metering integration, storage protocol interconnection, and customer data-migration
Guide Storage SREs in development of scripting and automation tools for configuration management, monitoring, and operational tasks
Guide team in problem identification, requirements gathering, solution ideation, and stakeholder alignment on engineering RFCs
Lead the team in supporting customers
Collaborate with the HPC Architecture team on drive selection, capacity determination, storage networking, cache placement, and rack layouts
Work closely with the storage software teams and networking teams to execute on cross-functional infrastructure initiatives and new data-center deployments including integration of storage protocols across a variety of on-prem storage solutions
Work with procurement data-center operations, and fleet engineering teams to deploy storage solutions into new and existing data centers
Work with vendors to troubleshoot customer performance, reliability, and data-integrity issues
Work closely with Networking, Compute, and Storage Software Engineering teams to deploy high-performance distributed storage solutions to serve AI/ML workloads
Partner with the fleet engineering team to ensure seamless deployment, monitoring, and maintenance of the distributed storage solutions
Stay current with the latest trends and research into AI and HPC storage technologies and vendor solutions
Guide team in investigating strategies for using Nvidia SuperNIC DPUs for storage edge-caching, offloading, and GPUDirect Storage capabilities
Work with the Lambda product team to uncover new trends in the AI inference and training product category that will inform emerging storage solutions
Encourage and support the team in exploring new technologies and approaches to improve system performance and efficiency
Qualification
Required
10+ years of experience in storage engineering with at least 5+ years in a management or lead role
Demonstrated experience leading a team of storage engineers and storage SREs on complex, cross-functional projects in a fast-paced startup environment
Extensive hands-on experience in designing, deploying, and maintaining distributed storage solutions in a CSP (Cloud Service Provider), NCP (Neo-Cloud provider), HPC-infrastructure integrator, or AI-infrastructure company
Experience with storage solutions serving storage volumes at a scale greater than 20PB
Strong project management skills, leading high-confidence planning, project execution, and delivery of team outcomes on schedule
Extensive experience with storage site reliability engineering
Experience with one or more of the following in an HPC or AI Infrastructure environment: Vast, DDN, Pure Storage, NetApp, Weka
Experience deploying CEPH at scale greater than 25PB
Experience in serving one or more of the following storage protocols: object storage (e.g., S3), block storage (e.g., iSCSI), or file storage (e.g., NFS, SMB, Lustre)
Professional individual contributor experience as a storage engineer or storage SRE
Familiarity with modern storage technologies (e.g., NVMe, RDMA, DPUs) and their role in optimizing performance
Experience building a high-performance team through deliberate hiring, upskilling, planned skills redundancy, performance-management, and expectation setting
Preferred
Experience driving cross-functional engineering management initiatives (coordinating events, strategic planning, coordinating large projects)
Experience with NVidia SuperNIC DPUs for edge-caching (such as implementing GPUDirect Storage)
Deep experience with Vast, Weka and/or NetApp in an HPC or AI Infrastructure environment
Deep experience implementing CEPH in an HPC or AI infrastructure environment at a scale greater than 100PB
Experience driving organizational improvements (processes, systems, etc.)
Experience training, or managing managers
Benefits
Health, dental, and vision coverage for you and your dependents
Wellness and Commuter stipends for select roles
401k Plan with 2% company match (USA employees)
Flexible Paid Time Off Plan that we all actually use
Company
Lambda
Lambda is a cloud-based platform that provides high-performance GPU hardware and cloud infrastructure for AI model training and inference.
H1B Sponsorship
Lambda has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (16)
2024 (1)
2023 (3)
2022 (2)
2021 (2)
2020 (3)
Funding
Current Stage
Late StageTotal Funding
$3.19BKey Investors
TWG GlobalJP MorganMacquarie Group
2025-11-18Series E· $1.5B
2025-08-19Debt Financing· $275M
2025-02-19Series D· $480M
Recent News
2026-01-11
2026-01-09
Company data provided by crunchbase