Dolby Laboratories · 13 hours ago
Multimodal AI Researcher, Audio
Dolby Laboratories is a leader in entertainment innovation, seeking a Multimodal AI Researcher, Audio to drive innovation in multimodal AI for audio applications. The role involves creating and implementing multimodal and audio AI technologies, partnering with other experts, and contributing to cutting-edge projects in the audio domain.
AudioBroadcastingConsumerHardwareMedia and EntertainmentVideo
Responsibilities
Partner closely with other domain experts to refine and execute Dolby’s technical strategy in artificial intelligence and machine learning
Use deep learning to create new solutions (including foundation models) and enhance existing applications
Push the state-of-the-art and develop intellectual property
Transfer technology to product groups
Establish research collaborations with external university partners
Mentor interns on novel research problems
Publish papers in top-tier conferences and journals
Advise internal leaders on recent deep learning advancements in the industry and academia to further influence research direction and business decisions
Qualification
Required
Ph.D. in Computer Science or similar field
A strong background in deep learning, both in terms of conceptual understanding, as well as practical experience
Technical knowledge of audio fundamentals
Deep passion for audio, music, and multimedia applications
Deep knowledge on current machine learning literature
Strong publication record, with publications in major machine learning conferences (e.g. NeurIPS, ICLR, ICML) or top domain-specific conferences is desirable (e.g., ACL, CVPR, ICASSP, Interspeech)
Highly skilled in Python and one or more popular deep learning frameworks (TensorFlow or PyTorch)
Ability to envision new technologies and turn them into innovative products
Good communication and collaboration skills
Preferred
Experience with language models, question answering, vision-language models, captioning, etc
Generative modeling for audio applications (diffusion models, autoregressive models, masked generative transformers)
Multimodal semantic understanding and multimodal reasoning
Multimodal representations (audio-video, audio-text, audio-video-text)
Multimodal AI architectures, with a focus on generating audio, music, and speech (text-to-audio, video-to-audio, image-to-audio)
Self and semi-supervised learning
AI driven audio enhancement, processing, and generation (for speech and music), such as speech enhancement and analysis, source separation, text-to-speech, text-to-music, music information retrieval, audio classification
LLMs for audio applications
Benefits
Bonus
Benefits
Equity
Company
Dolby Laboratories
Dolby creates surround sound, imaging, and voice technologies for cinemas, home theaters, PCs, mobile devices, games, and more.
H1B Sponsorship
Dolby Laboratories has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (31)
2024 (25)
2023 (19)
2022 (43)
2021 (55)
2020 (35)
Funding
Current Stage
Public CompanyTotal Funding
unknown2005-02-17IPO
Recent News
2026-01-06
Company data provided by crunchbase