Student Researcher [Seed Multimodality & World Model – RL + Streaming Video Understanding] – 2026 Start (PhD) jobs in United States
cer-icon
Apply on Employer Site
company-logo

ByteDance · 7 hours ago

Student Researcher [Seed Multimodality & World Model – RL + Streaming Video Understanding] – 2026 Start (PhD)

ByteDance is a pioneering technology company focused on advanced AI foundation models. They are seeking a PhD intern to contribute to the development of real-time multimodal agents for streaming video tasks, focusing on challenges in model architecture and reinforcement learning algorithms.

ContentData MiningFoundational AIInternetSocial Media
check
Comp. & Benefits
check
H1B Sponsor Likelynote

Responsibilities

Conduct research on streaming video understanding, especially for first-person or long-horizon applications, where the agent must continuously observe, interpret, and act
Apply reinforcement learning to improve real-time perception and planning capabilities of streaming agents, including learning from human feedback, demonstrations, and/or verifiable rewards
Build or enhance scalable data pipelines that convert offline video datasets into streaming-compatible formats, enabling the development of new agent capabilities
Design and evaluate video agents that integrate LLMs/VLMs with decision-making components for downstream applications (e.g., tool use, retrieval, resolution switching)

Qualification

Reinforcement learningStreaming video understandingData engineeringComputer VisionMachine LearningLarge-scale data pipelinesSoftware engineeringFirst-author publicationsLLMs

Required

Currently pursuing a PhD in Computer Vision, Machine Learning, or a related field
Research experience in video generation, world models, or dynamics modeling
First-author publications in CVPR, ICCV, ECCV, NeurIPS, ICLR, or ICML
Research experience in one or more of the following areas: Streaming video understanding, online video processing, or sequential decision making from continuous visual inputs
Reinforcement learning (RL), especially when combined with LLMs or multimodal models (e.g., decision-making with VLMs, generative agents, action-planning)
Data engineering, such as synthetic data generation, prompt engineering, scalable data pipeline curation

Preferred

Strong software engineering skills and ability to work in existing infrastructure (e.g., PyTorch, distributed training frameworks)
Familiarity with streaming video processing in multimodal LLMs
Experience working with RL for LLMs or multimodal LLMs
Experience working with large-scale data pipelines, including multimodal dataset processing and task-specific synthetic data generation

Benefits

Day one access to health insurance
Life insurance
Wellbeing benefits
10 paid holidays per year
Paid sick time (56 hours if hired in first half of year, 40 if hired in second half of year)
Housing allowance

Company

ByteDance

company-logo
ByteDance is a technology company that develops content creation platforms and services.

H1B Sponsorship

ByteDance has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1350)
2024 (1123)
2023 (775)
2022 (487)
2021 (417)
2020 (245)

Funding

Current Stage
Late Stage
Total Funding
$9.8B
Key Investors
Capital TodayG42Tiger Global Management
2025-11-20Secondary Market· $300M
2024-07-25Secondary Market
2023-03-14Secondary Market· $100M

Leadership Team

leader-logo
Jochen Bischoff
Head of Global Business Solutions - Africa
linkedin
leader-logo
Matty Lin
General Manager, Global Business Solutions, KR
linkedin
Company data provided by crunchbase