Apply on Employer Site

Liquid AI · 1 week ago

Member of Technical Staff - Training Infrastructure Engineer

San Francisco, CA

Full-time

Hybrid

Senior Level

Liquid AI, spun out of MIT, is focused on building efficient AI systems at every scale. They are seeking a Training Infrastructure Engineer to design and implement high-performance training infrastructure for their GPU clusters, enabling the development of specialized and large-scale multimodal models.

Artificial Intelligence (AI)Foundational AIGenerative AIInformation TechnologyMachine Learning

H1B Sponsor Likely

Responsibilities

Design and implement high-performance, scalable training infrastructure that efficiently utilizes our GPU clusters for both specialized and large-scale multimodal models

Build robust data loading systems that eliminate I/O bottlenecks and enable training on diverse multimodal datasets

Develop sophisticated checkpointing mechanisms that balance memory constraints with recovery needs across different model scales

Optimize communication patterns between nodes to minimize the overhead of distributed training for long-running experiments

Collaborate with ML engineers to implement new model architectures and training algorithms at scale

Create monitoring and debugging tools to ensure training stability and resource efficiency across our infrastructure

Qualification

Distributed training infrastructurePyTorch DistributedDeepSpeedMegatron-LMHardware acceleratorsNetworking topologiesPerformance bottlenecksData pipelinesOpen-source contributionsCheckpointing systemsMultimodal datasetsCollaboration with ML engineers

Required

extensive experience building distributed training infrastructure for language and multimodal models, with hands-on expertise in frameworks like PyTorch Distributed, DeepSpeed, or Megatron-LM

passionate about solving complex systems challenges in large-scale model training—from efficient multimodal data loading to sophisticated sharding strategies to robust checkpointing mechanisms

deep understanding of hardware accelerators and networking topologies, with the ability to optimize communication patterns for different parallelism strategies

skilled at identifying and resolving performance bottlenecks in training pipelines, whether they occur in data loading, computation, or communication between nodes

experience working with diverse data types (text, images, video, audio) and can build data pipelines that handle heterogeneous inputs efficiently

Preferred

implemented custom sharding techniques (tensor/pipeline/data parallelism) to scale training across distributed GPU clusters of varying sizes

experience optimizing data pipelines for multimodal datasets with sophisticated preprocessing requirements

built fault-tolerant checkpointing systems that can handle complex model states while minimizing training interruptions

contributed to open-source training infrastructure projects or frameworks

designed training infrastructure that works efficiently for both parameter-efficient specialized models and massive multimodal systems

Company

Liquid AI

Build efficient general-purpose AI at every scale.

Founded in 2023

Cambridge, Massachusetts, USA

51-200 employees

http://liquid.ai

H1B Sponsorship

Liquid AI has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (2)

Funding

Current Stage

Growth Stage

Total Funding

$293.1M

Key Investors

AMD VenturesOSS Capital L.P.

2024-12-13Series A· $250M

2023-12-01Seed· $37.5M

2023-05-05Seed· $5.6M

Leadership Team

Ramin Hasani

Co-founder and CEO

Mathias Lechner

Co-founder and CTO

Recent News

IEEE Spectrum

MIT’s AI Robotics Lab Director Is Building People-Centered Robots

2025-12-06

Digital Commerce 360

Shopify partners with Liquid AI on product search, recommendations

2025-11-15

Liquid AI

Liquid AI Announces Multi‑Year Partnership with Shopify to Bring Sub‑20ms Foundation Models to Core Commerce Experiences

2025-11-14

Company data provided by crunchbase