Multimodal AI Researcher, Audio jobs in United States
cer-icon
Apply on Employer Site
company-logo

Dolby Laboratories · 13 hours ago

Multimodal AI Researcher, Audio

Dolby Laboratories is a leader in entertainment innovation, seeking a Multimodal AI Researcher, Audio to drive innovation in multimodal AI for audio applications. The role involves creating and implementing multimodal and audio AI technologies, partnering with other experts, and contributing to cutting-edge projects in the audio domain.

AudioBroadcastingConsumerHardwareMedia and EntertainmentVideo
check
H1B Sponsor Likelynote

Responsibilities

Partner closely with other domain experts to refine and execute Dolby’s technical strategy in artificial intelligence and machine learning
Use deep learning to create new solutions (including foundation models) and enhance existing applications
Push the state-of-the-art and develop intellectual property
Transfer technology to product groups
Establish research collaborations with external university partners
Mentor interns on novel research problems
Publish papers in top-tier conferences and journals
Advise internal leaders on recent deep learning advancements in the industry and academia to further influence research direction and business decisions

Qualification

Deep learningGenerative modelingMultimodal AIPythonTensorFlowPyTorchAudio fundamentalsCommunicationCollaboration skills

Required

Ph.D. in Computer Science or similar field
A strong background in deep learning, both in terms of conceptual understanding, as well as practical experience
Technical knowledge of audio fundamentals
Deep passion for audio, music, and multimedia applications
Deep knowledge on current machine learning literature
Strong publication record, with publications in major machine learning conferences (e.g. NeurIPS, ICLR, ICML) or top domain-specific conferences is desirable (e.g., ACL, CVPR, ICASSP, Interspeech)
Highly skilled in Python and one or more popular deep learning frameworks (TensorFlow or PyTorch)
Ability to envision new technologies and turn them into innovative products
Good communication and collaboration skills

Preferred

Experience with language models, question answering, vision-language models, captioning, etc
Generative modeling for audio applications (diffusion models, autoregressive models, masked generative transformers)
Multimodal semantic understanding and multimodal reasoning
Multimodal representations (audio-video, audio-text, audio-video-text)
Multimodal AI architectures, with a focus on generating audio, music, and speech (text-to-audio, video-to-audio, image-to-audio)
Self and semi-supervised learning
AI driven audio enhancement, processing, and generation (for speech and music), such as speech enhancement and analysis, source separation, text-to-speech, text-to-music, music information retrieval, audio classification
LLMs for audio applications

Benefits

Bonus
Benefits
Equity

Company

Dolby Laboratories

company-logo
Dolby creates surround sound, imaging, and voice technologies for cinemas, home theaters, PCs, mobile devices, games, and more.

H1B Sponsorship

Dolby Laboratories has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (31)
2024 (25)
2023 (19)
2022 (43)
2021 (55)
2020 (35)

Funding

Current Stage
Public Company
Total Funding
unknown
2005-02-17IPO

Leadership Team

leader-logo
Kevin Yeaman
CEO
linkedin
leader-logo
Robert Park
Chief Financial Officer
linkedin
Company data provided by crunchbase