Zoom · 16 hours ago
AI Inference Engineer - Speech
Zoom is a company focused on building the best collaboration platform for the enterprise. They are seeking an AI Inference Engineer to develop state-of-the-art automatic speech recognition systems and optimize model inference performance for various Zoom products.
CollaborationInformation TechnologyMessagingSaaSVideo Conferencing
Responsibilities
Developing state-of-the-art speech services for Zoom products. Devising novel techniques where off-the-shelf solutions are not available
Optimizing ASR inference systems for production deployment, including inference latency, throughput, memory footprint, and resource utilization
Optimizing model inference performance by diving deep into the lower stack of inference frameworks, with a focus on hardware-specific optimizations for Nvidia GPUs
Proposing new model structures by joint optimization of model accuracy and inference speed
Designing and developing ASR systems with low latency and high accuracy requirements, while ensuring scalability of GPU infrastructure and improving throughput of ASR service
Profiling and debugging ASR runtime performance bottlenecks across different deployment hardware and environments
Qualification
Required
Possess a Master's in Computer Science, Electrical Engineering or related fields with 3+ years of experience in speech recognition, speech-llm or AI model inference
Display knowledge in deep learning and hands-on programming skills in Python, shell scripts, C/C++; familiarity with ML frameworks such as PyTorch and TensorFlow
Demonstrate deep understanding of transformer encoder-decoder frameworks for speech recognition, including attention mechanisms, beam search and sequence-to-sequence modeling for end-to-end ASR systems
Understand recent advancements in speech foundation models and speech-LLMs that integrate acoustic and linguistic representations, enabling unified modeling for speech understanding and transcription tasks
Have experience in optimizing deep learning model inference on NVIDIA GPUs, including profiling and accelerating AI models using CUDA, TensorRT, and mixed-precision computation to achieve low latency, high-throughput performance
Have experience developing and tuning custom CUDA kernels, leveraging CUDA Graphs for efficient execution scheduling, and minimizing kernel launch overhead to maximize GPU utilization
Be proficient in end-to-end performance analysis, memory optimization, and deployment of largescale ML models on GPU clusters. Experienced with stream management, asynchronous execution, and integrating frameworks such as PyTorch and TensorFlow for real-time inference
Benefits
A variety of perks, benefits, and options to help employees maintain their physical, mental, emotional, and financial health
Support work-life balance
Contribute to their community in meaningful ways
Company
Zoom
Zoom is a software company that offers a communications platform that connects people through video, voice, chat, and content sharing.
H1B Sponsorship
Zoom has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (16)
2024 (178)
2023 (144)
2022 (259)
2021 (86)
2020 (34)
Funding
Current Stage
Public CompanyTotal Funding
$276MKey Investors
ARK Investment ManagementSequoia CapitalEmergence Capital
2021-11-04Post Ipo Equity· $130M
2019-04-19Post Ipo Equity
2019-04-18IPO
Recent News
2026-01-13
BiometricUpdate.com
2026-01-13
2026-01-07
Company data provided by crunchbase