Storage Engineering Manager jobs in United States
cer-icon
Apply on Employer Site
company-logo

Lambda · 5 months ago

Storage Engineering Manager

Lambda is the #1 GPU Cloud for ML/AI teams, providing a platform for building, testing, and deploying AI products. The Storage Engineering Manager will lead a team to develop and manage high-performance storage solutions for AI/ML infrastructure, ensuring operational excellence and scalability.

AI InfrastructureArtificial Intelligence (AI)Cloud ComputingData CenterGPUMachine Learning
check
Comp. & Benefits
check
H1B Sponsor Likelynote

Responsibilities

Grow/Hire, lead, and mentor a top-talent team of high-performing storage engineers delivering HPC, petabyte-scale storage solutions
Foster a high-velocity culture of innovation, technical excellence, and collaboration
Conduct regular one-on-one meetings, provide constructive feedback, and support career development for team members
Drive outcomes by managing project priorities, deadlines, and deliverables using Agile methodologies
Drive the technical vision and strategy for Lambda distributed storage solutions
Lead storage vendor selection criteria, vendor selection, and vendor relationship management (support, installation, scheduling, specification, procurement)
Manage team in storage lifecycle management (installation, cabling, capacity upgrades, service, RMA, updating both hardware and software components as needed)
Guide choices around optimization of storage pools, sharding, and tiering/caching strategies
Lead team in tasks related to multi-tenant security, tenant provisioning, metering integration, storage protocol interconnection, and customer data-migration
Guide Storage SREs in development of scripting and automation tools for configuration management, monitoring, and operational tasks
Guide team in problem identification, requirements gathering, solution ideation, and stakeholder alignment on engineering RFCs
Lead the team in supporting customers
Collaborate with the HPC Architecture team on drive selection, capacity determination, storage networking, cache placement, and rack layouts
Work closely with the storage software teams and networking teams to execute on cross-functional infrastructure initiatives and new data-center deployments including integration of storage protocols across a variety of on-prem storage solutions
Work with procurement data-center operations, and fleet engineering teams to deploy storage solutions into new and existing data centers
Work with vendors to troubleshoot customer performance, reliability, and data-integrity issues
Work closely with Networking, Compute, and Storage Software Engineering teams to deploy high-performance distributed storage solutions to serve AI/ML workloads
Partner with the fleet engineering team to ensure seamless deployment, monitoring, and maintenance of the distributed storage solutions
Stay current with the latest trends and research into AI and HPC storage technologies and vendor solutions
Guide team in investigating strategies for using Nvidia SuperNIC DPUs for storage edge-caching, offloading, and GPUDirect Storage capabilities
Work with the Lambda product team to uncover new trends in the AI inference and training product category that will inform emerging storage solutions
Encourage and support the team in exploring new technologies and approaches to improve system performance and efficiency

Qualification

HPC storage solutionsDistributed storage solutionsStorage lifecycle managementStorage protocolsStorage site reliability engineeringCloud Service Provider experienceProject managementTeam leadershipAgile methodologiesInnovationCollaborationProblem identification

Required

10+ years of experience in storage engineering with at least 5+ years in a management or lead role
Demonstrated experience leading a team of storage engineers and storage SREs on complex, cross-functional projects in a fast-paced startup environment
Extensive hands-on experience in designing, deploying, and maintaining distributed storage solutions in a CSP (Cloud Service Provider), NCP (Neo-Cloud provider), HPC-infrastructure integrator, or AI-infrastructure company
Experience with storage solutions serving storage volumes at a scale greater than 20PB
Strong project management skills, leading high-confidence planning, project execution, and delivery of team outcomes on schedule
Extensive experience with storage site reliability engineering
Experience with one or more of the following in an HPC or AI Infrastructure environment: Vast, DDN, Pure Storage, NetApp, Weka
Experience deploying CEPH at scale greater than 25PB
Experience in serving one or more of the following storage protocols: object storage (e.g., S3), block storage (e.g., iSCSI), or file storage (e.g., NFS, SMB, Lustre)
Professional individual contributor experience as a storage engineer or storage SRE
Familiarity with modern storage technologies (e.g., NVMe, RDMA, DPUs) and their role in optimizing performance
Experience building a high-performance team through deliberate hiring, upskilling, planned skills redundancy, performance-management, and expectation setting

Preferred

Experience driving cross-functional engineering management initiatives (coordinating events, strategic planning, coordinating large projects)
Experience with NVidia SuperNIC DPUs for edge-caching (such as implementing GPUDirect Storage)
Deep experience with Vast, Weka and/or NetApp in an HPC or AI Infrastructure environment
Deep experience implementing CEPH in an HPC or AI infrastructure environment at a scale greater than 100PB
Experience driving organizational improvements (processes, systems, etc.)
Experience training, or managing managers

Benefits

Health, dental, and vision coverage for you and your dependents
Wellness and Commuter stipends for select roles
401k Plan with 2% company match (USA employees)
Flexible Paid Time Off Plan that we all actually use

Company

Lambda

twittertwittertwitter
company-logo
Lambda is a cloud-based platform that provides high-performance GPU hardware and cloud infrastructure for AI model training and inference.

H1B Sponsorship

Lambda has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (16)
2024 (1)
2023 (3)
2022 (2)
2021 (2)
2020 (3)

Funding

Current Stage
Late Stage
Total Funding
$3.19B
Key Investors
TWG GlobalJP MorganMacquarie Group
2025-11-18Series E· $1.5B
2025-08-19Debt Financing· $275M
2025-02-19Series D· $480M

Leadership Team

leader-logo
Stephen Balaban
Co-founder, CEO
linkedin
leader-logo
Michael Balaban
Co-Founder / CTO
linkedin
Company data provided by crunchbase