Research Scientist Intern - Machine Learning System @ ByteDance | Jobright.ai
JOBSarrow
RecommendedLiked
0
Applied
0
Research Scientist Intern - Machine Learning System jobs in San Jose, CA
Be an early applicantLess than 25 applicants
company-logo

ByteDance · 3 days ago

Research Scientist Intern - Machine Learning System

Wonder how qualified you are to the job?

ftfMaximize your interview chances
ContentData Mining
check
Comp. & Benefits

Insider Connection @ByteDance

Discover valuable connections within the company who might provide insights and potential referrals, giving your job application an inside edge.

Responsibilities

Research and develop machine learning systems, including heterogeneous computing architecture, management, scheduling, and monitoring.
Manage cross-layer optimization of system and AI algorithms and hardware for machine learning (GPU, ASIC).
Implement general purpose training framework features and model specific optimizations.
Improve efficiency and stability for extremely large scale distributed training jobs.

Qualification

Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.

Machine Learning AlgorithmsPyTorchJaxGPUASICProgramming LanguagesLinux EnvironmentC/C++CUDAPythonWork AuthorizationHigh Performance ComputingRDMA NetworkMPINCCLIbverbsDistributed TrainingDeepSpeedFSDPMegatronGSPMDAI Compiler StacksTorch.fxXLAMLIRLarge Scale Data ProcessingParallel ComputingCloud ComputingMachine LearningCUDA Programming

Required

Currently in PhD program in distributed, parallel computing principles and know the recent advances in computing, storage, networking, and hardware technologies.
Familiar with machine learning algorithms, platforms and frameworks such as PyTorch and Jax.
Have basic understanding of how GPU and/or ASIC works.
Expert in at least one or two programming languages in Linux environment: C/C++, CUDA, Python.
Must obtain work authorization in country of employment at the time of hire, and maintain ongoing work authorization during employment.

Preferred

GPU based high performance computing, RDMA high performance network (MPI, NCCL, ibverbs).
Distributed training framework optimizations such as DeepSpeed, FSDP, Megatron, GSPMD.
AI compiler stacks such as torch.fx, XLA and MLIR.
Large scale data processing and parallel computing.
Experiences in designing and operating large scale systems in cloud computing or machine learning.
Experiences in in-depth CUDA programming and performance tuning (cutlass, triton).

Benefits

100% premium coverage for Full-Time intern medical insurance after 90 days
Paid holidays
Paid sick leave
Employee Assistance Program for mental and emotional health benefits
Reimbursements for mobile phone expense

Company

ByteDance

company-logo
ByteDance is an internet technology company that operates creative content platforms.

Funding

Current Stage
Late Stage
Total Funding
$9.51B
Key Investors
G42Tiger Global ManagementGeneral Atlantic
2023-03-15Secondary Market· $100M
2020-12-11Private Equity· $2B
2020-03-30Secondary Market· Undisclosed

Leadership Team

leader-logo
Julie Gao
CFO
linkedin
leader-logo
Ahmed Hany
Principal Sales Engineer
linkedin
Company data provided by crunchbase
logo

Orion

Your AI Copilot