Student Researcher [Seed – Multimodal Interaction & World Model - Unified Model] – 2026 Start (PhD) jobs in United States
cer-icon
Apply on Employer Site
company-logo

ByteDance · 12 hours ago

Student Researcher [Seed – Multimodal Interaction & World Model - Unified Model] – 2026 Start (PhD)

ByteDance is dedicated to pioneering advanced AI foundation models, and they are seeking a Student Researcher for their Seed Multimodal Interaction and World Model team. The role involves developing unified modeling architectures for multimodal foundation models and collaborating with researchers to scale and adapt models for real-world scenarios.

ContentData MiningFoundational AIInternetSocial Media
check
Comp. & Benefits
check
H1B Sponsor Likelynote

Responsibilities

Develop and evaluate unified modeling architectures for multimodal foundation models across vision, audio, and language
Contribute to building a shared representation space that supports both generation and understanding tasks
Explore architectural and optimization strategies to improve generalization across modalities and tasks
Collaborate with researchers working on generation, reasoning, and world modeling to scale and adapt models for real-world scenarios

Qualification

Generative modelingMultimodal learningLarge-scale ML systemsJoint modeling strategiesVideo generationVision-language pretrainingPhD in relevant fieldPublications in top-tier venuesInterest in world modeling

Required

Currently pursuing a PhD in Software Development, Computer Science, Computer Engineering, or a related technical discipline
Publications in top-tier venues, such as CVPR, ECCV, ICCV, NeurIPS, ICLR, ICML, or other leading conferences in AI and ML
Strong research background in at least one of the following: generative modeling (e.g., diffusion models, transformers), multimodal learning, or representation learning
Solid engineering and modeling skills, with experience building and training large-scale ML systems
Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment

Preferred

Experience in building or training models for both generative and discriminative tasks
Familiarity with joint modeling strategies (e.g., multitask learning, contrastive alignment, autoregressive decoding for understanding)
Background in video generation, vision-language pretraining, or instruction-conditioned generation
Interest in long-context modeling, memory architectures, or world modeling tasks

Benefits

Day one access to health insurance
Life insurance
Wellbeing benefits
10 paid holidays per year
Paid sick time (56 hours if hired in first half of year, 40 if hired in second half of year)
Housing allowance

Company

ByteDance

company-logo
ByteDance is a technology company that develops content creation platforms and services.

H1B Sponsorship

ByteDance has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1350)
2024 (1123)
2023 (775)
2022 (487)
2021 (417)
2020 (245)

Funding

Current Stage
Late Stage
Total Funding
$9.8B
Key Investors
Capital TodayG42Tiger Global Management
2025-11-20Secondary Market· $300M
2024-07-25Secondary Market
2023-03-14Secondary Market· $100M

Leadership Team

leader-logo
Jochen Bischoff
Head of Global Business Solutions - Africa
linkedin
leader-logo
Matty Lin
General Manager, Global Business Solutions, KR
linkedin
Company data provided by crunchbase