Research Scientist / Engineer – Multimodal Capabilities jobs in United States
cer-icon
Apply on Employer Site
company-logo

Luma AI · 4 hours ago

Research Scientist / Engineer – Multimodal Capabilities

Luma AI is dedicated to building multimodal AI to enhance human capabilities. The role involves conducting pioneering research to define the future capabilities of multimodal models, designing experiments, and collaborating with research teams to translate findings into product experiences.

Artificial Intelligence (AI)Generative AIVideoVideo Editing
check
H1B Sponsor Likelynote

Responsibilities

Research and Define the next frontier of multimodal capabilities, identifying key gaps in our current models and designing the experiments to solve them
Design and Execute novel experiments, datasets, and methodologies to systematically improve model performance across vision, audio, and language
Develop and Pioneer new evaluation frameworks and benchmarking approaches to precisely measure novel multimodal behaviors and capabilities
Collaborate Deeply with other research teams to translate your findings into our core training recipes and unlock new product experiences
Build and Prototype compelling demonstrations that showcase the groundbreaking multimodal capabilities you have unlocked

Qualification

PythonPyTorchMultimodal data pipelinesComputer visionNatural language processingAudio processingResearch experiencePublication recordCollaborationProblem-solving

Required

You have a PhD or equivalent research experience in a field related to AI, Machine Learning, or Computer Science
You have strong programming skills in Python and deep, hands-on experience with PyTorch
You have a proven track record of working with multimodal data pipelines and curating large-scale datasets for research
You possess a deep, fundamental understanding of at least one of the core modalities: computer vision, audio processing, or natural language processing
You thrive on tackling the most ambitious, open-ended research challenges in a fast-paced, collaborative environment

Preferred

Direct expertise working with complex, interleaved multimodal data (video, audio, text)
Hands-on experience training or fine-tuning Vision Language Models (VLMs), Audio Language Models, or large-scale generative video models from scratch
A strong publication record in top-tier AI conferences (e.g., NeurIPS, ICML, CVPR, ICLR)
Experience leading ambitious, open-ended research projects from ideation to tangible results

Company

Luma AI

twittertwittertwitter
company-logo
Luma AI develops tools that let users generate photorealistic images and videos from text, image, or video prompts.

H1B Sponsorship

Luma AI has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (10)
2024 (3)

Funding

Current Stage
Growth Stage
Total Funding
$1.06B
Key Investors
HUMAINAndreessen HorowitzAmplify Partners
2025-11-19Series C· $900M
2024-12-06Series B· $90M
2024-01-09Series B· $43M

Leadership Team

leader-logo
Amit Jain
Co-Founder
linkedin
Company data provided by crunchbase