IntelliPro · 3 months ago
Research Scientist / Engineer – Training Infrastructure
IntelliPro is a global leader in talent acquisition and HR solutions, dedicated to building capable systems that can understand and interact with the world. They are seeking a Research Scientist / Engineer to design and optimize distributed training systems for multimodal foundation models using advanced techniques and large GPU clusters.
ConsultingHuman ResourcesRecruitingStaffing Agency
Responsibilities
Design, implement, and optimize efficient distributed training systems for models with thousands of GPUs
Research and implement advanced parallelization techniques (FSDP, Tensor Parallel, Pipeline Parallel, Expert Parallel)
Build monitoring, visualization, and debugging tools for large-scale training runs
Optimize training stability, convergence, and resource utilization across massive clusters
Qualification
Required
Extensive experience with distributed PyTorch training and parallelisms in foundation model training
Deep understanding of GPU clusters, networking, and storage systems
Familiarity with communication libraries (NCCL, MPI) and distributed system optimization
Preferred
Strong Linux systems administration and scripting capabilities
Experience managing training runs across >100 GPUs
Experience with containerization, orchestration, and cloud infrastructure
Benefits
Comprehensive benefits package
Company
IntelliPro
IntelliPro Group Inc. is one of the fastest growing IT services and HR solutions companies in Americas & APAC.
H1B Sponsorship
IntelliPro has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (16)
2024 (21)
2023 (33)
2022 (28)
2021 (34)
2020 (36)
Funding
Current Stage
Late StageRecent News
2024-05-21
Built In San Francisco
2024-04-07
Built In San Francisco
2024-04-07
Company data provided by crunchbase