Sanas · 2 months ago
AI Data Ops Lead
Sanas is pioneering the future of human communication with its innovative real-time speech transformation platform. The AI Data Ops Lead will be responsible for managing datasets that power speech and language models, designing data pipelines, and ensuring data quality while collaborating with various teams to enhance data collection and reporting processes.
Artificial Intelligence (AI)Language LearningSaaSSoftwareTranslation Service
Responsibilities
Build and maintain internal tools for data collection, labeling, and ingestion
Discover new data sources and prepare them into unified data frames for consumption
Coordinate with multiple stakeholders to ensure timely delivery of high quality data
Operate and design ETL data pipelines for large-scale audio, text, and metadata
Own data quality: Build tooling for quality assurance across all dimensions, discover inaccuracies and fix them + feed back into improving the QA tooling
Analyze dataset coverage, diversity, and quality; monitor bias and data drift
Create dashboards and visual reports tracking data distribution, collection throughput, and collection quality
Work cross-functionally to ensure that the data being made available meets our continuously evolving needs
Run a monthly newsletter reporting about any changes being made to the data and all the new data sources being made available
Design validation experiments for labeled datasets
Implement automated checks for consistency, completeness, and noise reduction
Support research teams with well-documented, high-integrity datasets
Qualification
Required
3–6 years of experience in data science, data operations, or ML data workflows
Strong programming skills in Python (pandas, NumPy, SQL, FastAPI or similar)
Proven experience building and maintaining Data dashboards (Gradio, Streamlit, Plotly, Dash, PowerBI, or similar)
Strong data analysis and visualization skills; comfort working with large, complex datasets
Familiarity with databases and cloud data infrastructure (SQL, DynamoDB, AWS Glue, S3, BigQuery, etc.)
Excellent communication and documentation skills; thrive in a fast-moving AI environment
Preferred
Experience with speech or audio datasets (e.g., ASR, TTS, voice embeddings, or diarization)
Familiarity with data labeling workflows for audio or text
Knowledge of signal processing, spectrogram analysis, or acoustic feature extraction
Experience with data orchestration tools (Dagster, Airflow, etc.)
Experience with building custom tooling on a need-basis (Retool, Replit, etc.)
Exposure to dataset versioning, evaluation pipelines, and MLOps principles
Interest in advancing the data foundations of AI research
Company
Sanas
Sanas is a real-time speech-understanding platform that modulates accents while preserving voices and emotions for natural interactions.
H1B Sponsorship
Sanas has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (2)
2024 (1)
Funding
Current Stage
Growth StageTotal Funding
$117.72MKey Investors
Insight PartnersHuman CapitalVillage Global
2025-02-19Series B· $65M
2023-03-23Series Unknown· $14.72M
2022-03-29Series A· $32M
Recent News
2025-11-19
Canada NewsWire
2025-10-23
Company data provided by crunchbase