Tencent · 2 months ago
Research Scientist – Speech and Audio Understanding (Large Models & Multimodal Systems)
Tencent is a leading technology company focused on innovation and development. They are seeking a Research Scientist to join their core research team, focusing on large-scale multimodal model systems that support speech and audio understanding, as well as advancing research in speech representation and multimodal applications.
AdvertisingInternetOnline GamesOnline PortalsSocial Media Marketing
Responsibilities
Develop general-purpose, end-to-end large speech models covering multilingual automatic speech recognition (ASR), speech translation, speech synthesis, paralinguistic understanding, and general audio understanding
Advance research on speech representation learning and encoder/decoder architectures to build unified acoustic representations for multi-task and multimodal applications
Explore representation alignment and fusion mechanisms between audio/speech and other modalities in large multimodal models, enabling joint modeling with image and text
Build and maintain high-quality multimodal speech datasets, including automatic annotation and data synthesis technologies
Qualification
Required
Ph.D. in Computer Science, Electrical Engineering, Artificial Intelligence, Linguistics, or a related field; or Master's degree with several years of relevant experience
Solid understanding of speech and audio signal processing, acoustic modeling, language modeling, and large model architectures
Proficient in one or more core speech system development pipelines such as ASR, TTS, or speech translation; experience with multilingual, multitask, or end-to-end systems is a plus
Familiar with Transformer-based architectures and their applications in speech and multimodal training/inference
Preferred
Candidates with in-depth research or practical experience in the following areas are strongly preferred:
Speech representation pretraining (e.g., HuBERT, Wav2Vec, Whisper)
Multimodal alignment and cross-modal modeling (e.g., audio-visual-text)
Experience driving state-of-the-art (SOTA) performance on audio understanding tasks with large models
Proficient in deep learning frameworks such as PyTorch or TensorFlow; experience with large-scale training and distributed systems is a plus
Benefits
Sign on payment
Relocation package
Restricted stock units
Medical, dental, vision, life and disability benefits
Participation in the Company’s 401(k) plan
Up to 15 to 25 days of vacation per year
Up to 13 days of holidays throughout the calendar year
Up to 10 days of paid sick leave per year
Company
Tencent
Tencent is an internet service portal offering value-added internet, mobile, telecom, and online advertising services.
H1B Sponsorship
Tencent has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (3)
2024 (11)
2023 (2)
2022 (2)
Funding
Current Stage
Public CompanyTotal Funding
$13.84BKey Investors
Lippo Group
2025-09-16Post Ipo Debt· $1.27B
2020-05-29Post Ipo Debt· $6B
2019-08-29Post Ipo Debt· $6.5B
Leadership Team
Recent News
2026-01-16
2026-01-16
Company data provided by crunchbase