Research Scientist TTS jobs in United States
cer-icon
Apply on Employer Site
company-logo

Pocket FM · 1 day ago

Research Scientist TTS

Pocket FM is on a mission to deliver personalized and immersive audio experiences to listeners worldwide. They are seeking an experienced research scientist to drive innovation in long-form content generation and localization, focusing on creating culturally-tailored storytelling experiences and developing state-of-the-art TTS systems.

AppsAudioDigital EntertainmentInformation TechnologyMedia and EntertainmentSoftware
check
Growth Opportunities
check
H1B Sponsor Likelynote
Hiring Manager
Shashwat Shukla
linkedin

Responsibilities

Model Development :Design, implement, and optimize modern neural TTS systems, including diffusion- and flow-based architectures, neural codec–based speech generation, and LLM-conditioned or hybrid speech synthesis models for expressive, long-form audio
Speech Controllability : Develop methods for fine-grained control over speech attributes like pitch, rhythm, emotion, and speaker style to enhance storytelling quality
Efficiency & Latency : Optimize models for real-time inference and high-scale production, utilizing techniques like knowledge distillation and model quantization
Multilingual Synthesis : Spearhead research into cross-lingual and multilingual TTS to support global content localization
Quality Evaluation : Design and implement robust evaluation frameworks, including MOS (Mean Opinion Score) and objective metrics, to assess the naturalness and intelligibility of generated speech

Qualification

Speech synthesisDigital signal processingNeural TTS systemsPythonDeep learningMultilingual synthesisTTS toolingResearch publicationCollaboration

Required

Demonstrated experience in speech synthesis, digital signal processing (DSP), and audio analysis
Proficiency with speech-specific frameworks and libraries such as Coqui TTS, ESPnet, or NVIDIA NeMo
Hands-on experience with sequence-to-sequence models, GANs, Variational Autoencoders (VAEs), and Diffusion models for audio
Experience in building high-quality audio datasets, including voice cloning, speaker verification, and handling prosody
Master's or PhD degree in Computer Science, Machine Learning, or a related field
Significant Python and applied research experience in industrial settings
Proficiency in frameworks such as PyTorch or TensorFlow
Demonstrated experience in deep learning, especially language modeling with transformers and machine translation
Prior experience working with vector databases, search indices, or other data stores for search and retrieval use cases
Preference for fast-paced, collaborative projects with concrete goals, quantitatively tested through A/B experiments
Published research in peer-reviewed journals and conferences on relevant topics

Company

Pocket FM

twittertwittertwitter
company-logo
Pocket FM creates audio series platforms for long-form audio entertainment.

H1B Sponsorship

Pocket FM has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1)

Funding

Current Stage
Late Stage
Total Funding
$212.52M
Key Investors
Silicon Valley BankLightspeed India PartnersTencent
2024-03-15Series D· $103M
2023-05-02Debt Financing· $16M
2022-03-03Series C· $64.83M

Leadership Team

leader-logo
Nishanth S.
Co-Founder
linkedin
leader-logo
Prateek Dixit
Co-Founder
linkedin
Company data provided by crunchbase