AI Data Ops Lead jobs in United States
cer-icon
Apply on Employer Site
company-logo

Sanas · 2 months ago

AI Data Ops Lead

Sanas is pioneering the future of human communication with its innovative real-time speech transformation platform. The AI Data Ops Lead will be responsible for managing datasets that power speech and language models, designing data pipelines, and ensuring data quality while collaborating with various teams to enhance data collection and reporting processes.

Artificial Intelligence (AI)Language LearningSaaSSoftwareTranslation Service
check
H1B Sponsor Likelynote

Responsibilities

Build and maintain internal tools for data collection, labeling, and ingestion
Discover new data sources and prepare them into unified data frames for consumption
Coordinate with multiple stakeholders to ensure timely delivery of high quality data
Operate and design ETL data pipelines for large-scale audio, text, and metadata
Own data quality: Build tooling for quality assurance across all dimensions, discover inaccuracies and fix them + feed back into improving the QA tooling
Analyze dataset coverage, diversity, and quality; monitor bias and data drift
Create dashboards and visual reports tracking data distribution, collection throughput, and collection quality
Work cross-functionally to ensure that the data being made available meets our continuously evolving needs
Run a monthly newsletter reporting about any changes being made to the data and all the new data sources being made available
Design validation experiments for labeled datasets
Implement automated checks for consistency, completeness, and noise reduction
Support research teams with well-documented, high-integrity datasets

Qualification

PythonData dashboardsData analysisCloud data infrastructureData labeling workflowsInterest in AI researchCommunicationDocumentation skills

Required

3–6 years of experience in data science, data operations, or ML data workflows
Strong programming skills in Python (pandas, NumPy, SQL, FastAPI or similar)
Proven experience building and maintaining Data dashboards (Gradio, Streamlit, Plotly, Dash, PowerBI, or similar)
Strong data analysis and visualization skills; comfort working with large, complex datasets
Familiarity with databases and cloud data infrastructure (SQL, DynamoDB, AWS Glue, S3, BigQuery, etc.)
Excellent communication and documentation skills; thrive in a fast-moving AI environment

Preferred

Experience with speech or audio datasets (e.g., ASR, TTS, voice embeddings, or diarization)
Familiarity with data labeling workflows for audio or text
Knowledge of signal processing, spectrogram analysis, or acoustic feature extraction
Experience with data orchestration tools (Dagster, Airflow, etc.)
Experience with building custom tooling on a need-basis (Retool, Replit, etc.)
Exposure to dataset versioning, evaluation pipelines, and MLOps principles
Interest in advancing the data foundations of AI research

Company

Sanas

twittertwitter
company-logo
Sanas is a real-time speech-understanding platform that modulates accents while preserving voices and emotions for natural interactions.

H1B Sponsorship

Sanas has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (2)
2024 (1)

Funding

Current Stage
Growth Stage
Total Funding
$117.72M
Key Investors
Insight PartnersHuman CapitalVillage Global
2025-02-19Series B· $65M
2023-03-23Series Unknown· $14.72M
2022-03-29Series A· $32M

Leadership Team

leader-logo
Shawn Zhang
Co-Founder & CTO
linkedin
Company data provided by crunchbase