2077AI Open Source Foundation ยท 1 week ago
Research Intern (Video)
2077AI Open Source Foundation is seeking a Research Intern to advance video understanding and multimodal reasoning. The role involves working on cutting-edge datasets and benchmarks with a direct impact on large-scale video QA, action recognition, and audio-visual understanding.
Computer Software
Responsibilities
Build and refine datasets for video understanding and multimodal reasoning, including temporal QA, action recognition, event prediction, and spatial understanding
Evaluate video-language models (Video-LLMs) and audio-visual datasets, including those derived from large-scale sources such as HowTo100M
Conduct experiments analyzing long-context modeling efficiency, compression strategies, and data optimization techniques
Contribute to benchmark standardization efforts and assist in setting up public leaderboards for evaluation and comparison
Qualification
Required
Strong background in computer vision, video analytics, or multimodal learning
Proficient in building and managing video data processing pipelines
Understanding of transformer-based temporal models (e.g., TimeSformer, VideoGPT, etc.)
Preferred
Experience with video-QA, action recognition, or multimodal reasoning datasets
Relevant publications in top-tier conferences
Company
2077AI Open Source Foundation
The 2077AI Foundation, is at the forefront of AI data standardization and progression.
Funding
Current Stage
Growth StageCompany data provided by crunchbase