Apply on Employer Site

ByteDance · 14 hours ago

Student Researcher [Seed – Multimodal Interaction & World Model - Unified Model] – 2026 Start (PhD)

San Jose, CA

Internship

Onsite

Intern

$65/hr - $65/hr

Start in 2026

ByteDance is dedicated to pioneering advanced AI foundation models, and they are seeking a Student Researcher for their Seed Multimodal Interaction and World Model team. The role involves developing unified modeling architectures for multimodal foundation models and collaborating with researchers to scale and adapt models for real-world scenarios.

ContentData MiningFoundational AIInternetSocial Media

Comp. & Benefits

H1B Sponsor Likely

Responsibilities

Develop and evaluate unified modeling architectures for multimodal foundation models across vision, audio, and language

Contribute to building a shared representation space that supports both generation and understanding tasks

Explore architectural and optimization strategies to improve generalization across modalities and tasks

Collaborate with researchers working on generation, reasoning, and world modeling to scale and adapt models for real-world scenarios

Qualification

Generative modelingMultimodal learningLarge-scale ML systemsJoint modeling strategiesVideo generationVision-language pretrainingPhD in relevant fieldPublications in top-tier venuesInterest in world modeling

Required

Currently pursuing a PhD in Software Development, Computer Science, Computer Engineering, or a related technical discipline

Publications in top-tier venues, such as CVPR, ECCV, ICCV, NeurIPS, ICLR, ICML, or other leading conferences in AI and ML

Strong research background in at least one of the following: generative modeling (e.g., diffusion models, transformers), multimodal learning, or representation learning

Solid engineering and modeling skills, with experience building and training large-scale ML systems

Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment

Preferred

Experience in building or training models for both generative and discriminative tasks

Familiarity with joint modeling strategies (e.g., multitask learning, contrastive alignment, autoregressive decoding for understanding)

Background in video generation, vision-language pretraining, or instruction-conditioned generation

Interest in long-context modeling, memory architectures, or world modeling tasks

Benefits

Day one access to health insurance

Life insurance

Wellbeing benefits

10 paid holidays per year

Paid sick time (56 hours if hired in first half of year, 40 if hired in second half of year)

Housing allowance

Company

ByteDance

Glassdoor3.8

ByteDance is a technology company that develops content creation platforms and services.

Founded in 2012

Beijing, Beijing, CHN

10001+ employees

http://bytedance.com

H1B Sponsorship

ByteDance has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (1350)

2024 (1123)

2023 (775)

2022 (487)

2021 (417)

2020 (245)

Funding

Current Stage

Late Stage

Total Funding

$9.8B

Key Investors

Capital TodayG42Tiger Global Management

2025-11-20Secondary Market· $300M

2024-07-25Secondary Market

2023-03-14Secondary Market· $100M

Leadership Team

Jochen Bischoff

Head of Global Business Solutions - Africa

Matty Lin

General Manager, Global Business Solutions, KR

Recent News

WebProNews

TikTok Restructures US Operations in Oracle Joint Venture Amid Regulations

2026-01-16

36kr.com

Under the shadow of Huawei, SERES' million-unit "coming-of-age ceremony"

2026-01-16

EqualOcean

Independent Variable Robotics Technology Completes RMB 1 Billion Series A++ Financing

2026-01-14

Company data provided by crunchbase