Apply on Employer Site

2077AI Open Source Foundation · 1 week ago

Research Intern (Video)

United States

Internship

Remote

Intern

$25/hr - $35/hr

2077AI Open Source Foundation is seeking a Research Intern to advance video understanding and multimodal reasoning. The role involves working on cutting-edge datasets and benchmarks with a direct impact on large-scale video QA, action recognition, and audio-visual understanding.

Computer Software

Responsibilities

Build and refine datasets for video understanding and multimodal reasoning, including temporal QA, action recognition, event prediction, and spatial understanding

Evaluate video-language models (Video-LLMs) and audio-visual datasets, including those derived from large-scale sources such as HowTo100M

Conduct experiments analyzing long-context modeling efficiency, compression strategies, and data optimization techniques

Contribute to benchmark standardization efforts and assist in setting up public leaderboards for evaluation and comparison

Qualification

Computer VisionVideo AnalyticsMultimodal LearningVideo Data ProcessingTransformer-based ModelsVideo-QAAction RecognitionRelevant Publications

Required

Strong background in computer vision, video analytics, or multimodal learning

Proficient in building and managing video data processing pipelines

Understanding of transformer-based temporal models (e.g., TimeSformer, VideoGPT, etc.)

Preferred

Experience with video-QA, action recognition, or multimodal reasoning datasets

Relevant publications in top-tier conferences

Company

2077AI Open Source Foundation

The 2077AI Foundation, is at the forefront of AI data standardization and progression.

Singapore, SG

51-200 employees

https://www.2077ai.com/

Funding

Current Stage

Growth Stage

Company data provided by crunchbase