PhD Audio AI Engineer (Speech Conversion, TTS & ASR) jobs in United States
cer-icon
Apply on Employer Site
company-logo

Zoom · 1 week ago

PhD Audio AI Engineer (Speech Conversion, TTS & ASR)

Zoom is a company that helps people stay connected through innovative communication solutions. They are seeking an Audio AI Engineer to research and develop algorithms for accent conversion, voice conversion, speech synthesis, and speech recognition, focusing on low-latency streaming architectures.

CollaborationInformation TechnologyMessagingSaaSVideo Conferencing
check
H1B Sponsor Likelynote

Responsibilities

Researching, designing, and developing algorithms for accent conversion, voice conversion, speech synthesis, and automatic speech recognition, focusing on low-latency streaming architectures
Prototyping end-to-end audio models that enhance intelligibility and naturalness while preserving speaker identity and expressiveness
Collaborating closely with product and platform teams to integrate models into real-time video and audio communication systems
Analyzing and optimizing model performance across speech quality, latency, robustness, and scalability dimensions
Staying current with the latest developments in speech processing research, and contribute to the community through patents, and internal knowledge sharing

Qualification

Deep learning frameworksSpeech synthesisAutomatic speech recognitionVoice conversionPython programmingC/C++ programmingSequence modeling architecturesModel compression techniquesReal-time audio systemsResearch publicationFluency in Mandarin

Required

Hold a PhD or equivalent experience in a relevant field in Streaming, Voice Conversion, TTS, or ASR
Show proficiency in deep learning frameworks like PyTorch or TensorFlow
Demonstrate effective programming skills in Python, C/C++, or similar languages
Have an understanding of sequence modeling architectures (Transformers, RNNs, diffusion models, or conformers)
Demonstrate experience developing and deploying low-latency, real-time speech or audio models with streaming architectures and optimized pipelines
Show familiarity with model compression and acceleration techniques, including quantization, pruning, and distillation
Exhibit experience working with real-time audio systems in networked communication environments
Publish in top-tier conferences such as ICASSP, INTERSPEECH, NeurIPS, and ICLR
Must be fluent in Mandarin

Benefits

A variety of perks, benefits, and options to help employees maintain their physical, mental, emotional, and financial health
Support work-life balance
Contribute to their community in meaningful ways

Company

Zoom

twittertwittertwitter
company-logo
Zoom is a software company that offers a communications platform that connects people through video, voice, chat, and content sharing.

H1B Sponsorship

Zoom has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (16)
2024 (178)
2023 (144)
2022 (259)
2021 (86)
2020 (34)

Funding

Current Stage
Public Company
Total Funding
$276M
Key Investors
ARK Investment ManagementSequoia CapitalEmergence Capital
2021-11-04Post Ipo Equity· $130M
2019-04-19Post Ipo Equity
2019-04-18IPO

Leadership Team

leader-logo
Eric Yuan
Founder & CEO
linkedin
leader-logo
Xuedong Huang
Chief Technology Officer
linkedin
Company data provided by crunchbase