Tech Lead — ASR / TTS / Speech LLM (IC + Mentor) jobs in United States
cer-icon
Apply on Employer Site
company-logo

OutcomesAI · 2 months ago

Tech Lead — ASR / TTS / Speech LLM (IC + Mentor)

OutcomesAI is a healthcare technology company focused on building an AI-enabled nursing platform to enhance clinical workflows and patient care. The Tech Lead will oversee the technical development of speech models, guiding a small team in model training and optimization for healthcare applications.

Artificial Intelligence (AI)Health CareMedical

Responsibilities

Own the technical roadmap for STT/TTS/Speech LLM model training: from model selection → fine-tuning → deployment
Evaluate and benchmark open-source models (Parakeet, Whisper, etc.) using internal test sets for WER, latency, and entity accuracy
Design and review data pipelines for synthetic and real data generation (text selection, speaker selection. voice synthesis, noise/distortion augmentation)
Architect and optimize training recipes (LoRA/adapters, RNN-T, multi-objective CTC + MWER)
Lead integration with Triton Inference Server (TensorRT/FP16) and ensure K8s autoscaling for 1000+ concurrent streams
Implement Language Model biasing APIs, WFST grammars, and context biasing for domain accuracy
Guide evaluation cycles, drift monitoring, and model switcher/failover strategies
Mentor engineers on data curation, fine-tuning, and model serving best practices
Collaborate with backend/ML-ops for production readiness, observability, and health metrics

Qualification

Speech models expertisePyTorchTriton Inference ServerKubernetesStreaming RNN-TTelephony robustnessSpeaker DiarizationTTS frameworksSpeech LLM integrationCode reviewMentorship

Required

M.S. / Ph.D. in Computer Science, Speech Processing, or related field
7–10 years of experience in applied ML, at least 3 in speech or multimodal AI
Track record of shipping production ASR/TTS models or inference systems at scale
Deep expertise in speech models (ASR, TTS, Speech LLM) and training frameworks (PyTorch, NeMo, ESPnet, Fairseq)
Proven experience with streaming RNN-T / CTC architectures, LoRA/adapters, and TensorRT optimization
Telephony robustness: Codec augmentation (G.711 μ-law, Opus, packet loss/jitter), AGC/loudness norm, band-limit (300–3400 Hz), far-field/noise simulation
Strong understanding of telephony noise, codecs, and real-world audio variability
Experience in Speaker Diarization, turn detection model, smart voice activity detectionEvaluation: WER/latency curves, Entity-F1 (names/DOB/meds), confidence metrics
TTS : VITS/FastPitch/Glow-TTS/Grad-TTS/StyleTTS2, CosyVoice/NaturalSpeech-3 style transfer, BigVGAN/UnivNet vocoders, zero-shot cloning
Speech LLM: Model development and integration with Voice agent pipeline
Experience deploying models with Triton Inference Server, Kubernetes, and GPU scaling
Hands-on with evaluation metrics (WER, F1 on entities, latency p50/p95)
Familiarity with LM biasing, WFST grammars, and context injection
Strong mentorship and code-review discipline

Company

OutcomesAI

twittertwitter
company-logo
OutcomesAI operates as a healthcare company.

Funding

Current Stage
Early Stage
Total Funding
$10M
Key Investors
Sante Ventures
2025-10-14Seed· $10M

Leadership Team

leader-logo
Kuldeep Rajput
Founder & CEO
linkedin
Company data provided by crunchbase