SIGN IN
Senior Machine Learning Researcher jobs in United States
cer-icon
Apply on Employer Site
company-logo

Protege · 1 day ago

Senior Machine Learning Researcher

Protege is building a platform to solve the biggest unmet need in AI — getting access to the right training data. The Senior Machine Learning Researcher will lead the evaluation and optimization of large-scale datasets to ensure high-quality data for AI model training, collaborating closely with research and engineering teams.
AnalyticsArtificial Intelligence (AI)Data Management

Responsibilities

Design and apply statistical and machine learning methods to curate, filter, and enrich large-scale unstructured datasets
Develop frameworks to assess data diversity, duplication, and informativeness. Design statistical approaches to de-risk training datasets
Collaborate with model training teams to identify data bottlenecks and optimize dataset performance. Emphasis on ability to collaborate with large foundational models and smaller startups
Provide leadership on data quality strategy and shape internal best practices
Evaluate external datasets for integration, focusing on scalability, quality, and relevance to model performance. Help build data scorecards
Contribute to research and development of tools that automate data preprocessing and validation

Qualification

Machine LearningStatistical AnalysisData Quality StrategyLarge-scale DatasetsData ValidationSynthetic Data GenerationPerformance MetricsCollaboration

Required

PhD or equivalent Master's Degree + 4+ years industry experience in machine learning, economics, mathematics, engineering, computer science, statistics, or a related quantitative field
Strong understanding of AI model training pipelines, including pre-processing and evaluation
Experience working with large, unstructured datasets, especially text
Background in statistical analysis, bias detection, and data validation
Able to identify high-impact problems and drive independent solutions

Preferred

Experience with synthetic data generation or augmentation strategies
Publications or open-source contributions in data-centric AI or related areas
Experience developing evaluation frameworks or performance metrics for training data
Cross-functional collaboration with product, infrastructure, or partnership teams

Company

Protege

twittertwitter
company-logo
Protege is the AI training data platform enabling seamless and compliant data exchange.

Funding

Current Stage
Early Stage
Total Funding
$65M
Key Investors
Andreessen HorowitzFootworkCRV
2026-01-07Series A· $30M
2025-08-13Series A· $25M
2024-09-10Seed· $10M
Company data provided by crunchbase