Apply on Employer Site

Zoom · 1 week ago

PhD Audio AI Engineer (Speech Conversion, TTS & ASR)

San Jose, CA

Full-time

Hybrid

Senior Level

$128K/yr - $255K/yr

Zoom is a company that helps people stay connected through innovative communication solutions. They are seeking an Audio AI Engineer to research and develop algorithms for accent conversion, voice conversion, speech synthesis, and speech recognition, focusing on low-latency streaming architectures.

CollaborationInformation TechnologyMessagingSaaSVideo Conferencing

H1B Sponsor Likely

Responsibilities

Researching, designing, and developing algorithms for accent conversion, voice conversion, speech synthesis, and automatic speech recognition, focusing on low-latency streaming architectures

Prototyping end-to-end audio models that enhance intelligibility and naturalness while preserving speaker identity and expressiveness

Collaborating closely with product and platform teams to integrate models into real-time video and audio communication systems

Analyzing and optimizing model performance across speech quality, latency, robustness, and scalability dimensions

Staying current with the latest developments in speech processing research, and contribute to the community through patents, and internal knowledge sharing

Qualification

Deep learning frameworksSpeech synthesisAutomatic speech recognitionVoice conversionPython programmingC/C++ programmingSequence modeling architecturesModel compression techniquesReal-time audio systemsResearch publicationFluency in Mandarin

Required

Hold a PhD or equivalent experience in a relevant field in Streaming, Voice Conversion, TTS, or ASR

Show proficiency in deep learning frameworks like PyTorch or TensorFlow

Demonstrate effective programming skills in Python, C/C++, or similar languages

Have an understanding of sequence modeling architectures (Transformers, RNNs, diffusion models, or conformers)

Demonstrate experience developing and deploying low-latency, real-time speech or audio models with streaming architectures and optimized pipelines

Show familiarity with model compression and acceleration techniques, including quantization, pruning, and distillation

Exhibit experience working with real-time audio systems in networked communication environments

Publish in top-tier conferences such as ICASSP, INTERSPEECH, NeurIPS, and ICLR

Must be fluent in Mandarin

Benefits

A variety of perks, benefits, and options to help employees maintain their physical, mental, emotional, and financial health

Support work-life balance

Contribute to their community in meaningful ways

Company

Zoom

Zoom is a software company that offers a communications platform that connects people through video, voice, chat, and content sharing.

Founded in 2011

San Jose, California, USA

5001-10000 employees

https://www.zoom.com

H1B Sponsorship

Zoom has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (16)

2024 (178)

2023 (144)

2022 (259)

2021 (86)

2020 (34)

Funding

Current Stage

Public Company

Total Funding

$276M

Key Investors

ARK Investment ManagementSequoia CapitalEmergence Capital

2021-11-04Post Ipo Equity· $130M

2019-04-19Post Ipo Equity

2019-04-18IPO

Leadership Team

Eric Yuan

Founder & CEO

Xuedong Huang

Chief Technology Officer

Recent News

IndiaTimes

MarkupAI CEO creates ‘fantasy board of directors’ with Steve Jobs, Warren Buffett, and others to help him plan meetings

2026-01-16

Help Net Security

LinkedIn wants to make verification a portable trust signal

2026-01-16

Benzinga.com

Beyond The Numbers: 13 Analysts Discuss Zoom Communications Stock

2026-01-13

Company data provided by crunchbase