ByteDance · 2 days ago
Research Scientist Intern - Machine Learning System
Wonder how qualified you are to the job?
ContentData Mining
Insider Connection @ByteDance
Responsibilities
Research and develop efficient machine learning systems, including optimizers, parameters, and gradient training with rank reduction and communication compression.
Develop a state-of-the-art asynchronous training framework ensuring convergence.
Implement general purpose training framework features and model specific optimizations.
Improve efficiency and stability for extremely large scale distributed training jobs.
Qualification
Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.
Required
Currently in PhD program in distributed, parallel computing principles and know the recent advances in computing, storage, networking, and hardware technologies
Familiar with machine learning algorithms, platforms and frameworks such as PyTorch and Jax
Have basic understanding of how GPU and/or ASIC works
Expert in at least one or two programming languages in Linux environment: C/C++, CUDA, Python
Must obtain work authorization in country of employment at the time of hire, and maintain ongoing work authorization during employment
Preferred
GPU based high performance computing, RDMA high performance network (MPI, NCCL, ibverbs)
Distributed training framework optimizations such as DeepSpeed, FSDP, Megatron, GSPMD
AI compiler stacks such as torch.fx, XLA and MLIR
Large scale data processing and parallel computing
Experiences in designing and operating large scale systems in cloud computing or machine learning
Experiences in in-depth CUDA programming and performance tuning (cutlass, triton)
Benefits
Paid holidays
Paid sick leave
Employee Assistance Program for mental and emotional health benefits
Mobile phone expense reimbursements
Company
ByteDance
ByteDance is an internet technology company that operates creative content platforms.
Funding
Current Stage
Late StageTotal Funding
$9.51BKey Investors
G42Tiger Global ManagementGeneral Atlantic
2023-03-15Secondary Market· $100M
2020-12-11Private Equity· $2B
2020-03-30Secondary Market· Undisclosed
Recent News
Music Business Worldwide
2024-06-05
2024-06-04
South China Morning Post
2024-06-04
Company data provided by crunchbase