Apply on Employer Site

Pocket FM · 1 day ago

Research Scientist TTS

San Francisco Bay Area

Full-time

Hybrid

Senior Level

$200K/yr - $220K/yr

Pocket FM is on a mission to deliver personalized and immersive audio experiences to listeners worldwide. They are seeking an experienced research scientist to drive innovation in long-form content generation and localization, focusing on creating culturally-tailored storytelling experiences and developing state-of-the-art TTS systems.

AppsAudioDigital EntertainmentInformation TechnologyMedia and EntertainmentSoftware

Growth Opportunities

H1B Sponsor Likely

Hiring Manager

Shashwat Shukla

Responsibilities

Model Development :Design, implement, and optimize modern neural TTS systems, including diffusion- and flow-based architectures, neural codec–based speech generation, and LLM-conditioned or hybrid speech synthesis models for expressive, long-form audio

Speech Controllability : Develop methods for fine-grained control over speech attributes like pitch, rhythm, emotion, and speaker style to enhance storytelling quality

Efficiency & Latency : Optimize models for real-time inference and high-scale production, utilizing techniques like knowledge distillation and model quantization

Multilingual Synthesis : Spearhead research into cross-lingual and multilingual TTS to support global content localization

Quality Evaluation : Design and implement robust evaluation frameworks, including MOS (Mean Opinion Score) and objective metrics, to assess the naturalness and intelligibility of generated speech

Qualification

Speech synthesisDigital signal processingNeural TTS systemsPythonDeep learningMultilingual synthesisTTS toolingResearch publicationCollaboration

Required

Demonstrated experience in speech synthesis, digital signal processing (DSP), and audio analysis

Proficiency with speech-specific frameworks and libraries such as Coqui TTS, ESPnet, or NVIDIA NeMo

Hands-on experience with sequence-to-sequence models, GANs, Variational Autoencoders (VAEs), and Diffusion models for audio

Experience in building high-quality audio datasets, including voice cloning, speaker verification, and handling prosody

Master's or PhD degree in Computer Science, Machine Learning, or a related field

Significant Python and applied research experience in industrial settings

Proficiency in frameworks such as PyTorch or TensorFlow

Demonstrated experience in deep learning, especially language modeling with transformers and machine translation

Prior experience working with vector databases, search indices, or other data stores for search and retrieval use cases

Preference for fast-paced, collaborative projects with concrete goals, quantitatively tested through A/B experiments

Published research in peer-reviewed journals and conferences on relevant topics

Company

Pocket FM

Pocket FM creates audio series platforms for long-form audio entertainment.

Founded in 2018

Bangalore, Karnataka, IND

501-1000 employees

https://www.pocketfm.com

H1B Sponsorship

Pocket FM has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (1)

Funding

Current Stage

Late Stage

Total Funding

$212.52M

Key Investors

Silicon Valley BankLightspeed India PartnersTencent

2024-03-15Series D· $103M

2023-05-02Debt Financing· $16M

2022-03-03Series C· $64.83M

Leadership Team

Nishanth S.

Co-Founder

Prateek Dixit

Co-Founder

Recent News

IndiaTimes

Pocket FM, Arka Mediaworks to launch new Baahubali audio series

2025-12-13

Inc42 Media

A Year Of Pink Slips: 9,500+ Layoffs Among Indian Startups In 2025

2025-12-06

Livemint.com

Content is king: Box-office duds and OTT saturation fail to deter investors

2025-11-11

Company data provided by crunchbase