Protege · 4 months ago
Forward-Deployed Data Scientist (Media Curation & Delivery)
Protege is focused on addressing the critical need for accessible training data in AI, facilitating a secure and efficient exchange of such data. The Forward-Deployed Data Scientist will work closely with sales, product, and account management teams to curate and deliver tailored media datasets for AI model training, leveraging their expertise in the media catalog and ensuring high-quality data delivery.
AnalyticsArtificial Intelligence (AI)Data Management
Responsibilities
Work with Sales and Account Management to interpret customer requirements and translate them into curation strategies
Query and analyze Protege’s media catalog (SQL, internal APIs, and metadata tools) to identify relevant content
Use AI tools and transcoded embeddings to surface and refine clip-level content
Conduct iterative sample reviews with customers — gathering feedback, refining selections, and ensuring final packages meet spec
Develop a deep understanding of Protege’s media catalog structure, metadata, and growth patterns
Track and analyze content coverage, diversity, and modality mix; identify gaps relative to customer demand
Partner with Product and Partnerships to feed back catalog insights that inform sourcing priorities
Collaborate cross-functionally to ensure content packaging aligns with technical, ethical, and licensing requirements
Develop methods, scripts, or internal tools that make curation more efficient and scalable
Support the evolution of Protege’s delivery platform — helping define how internal users and customers search, sample, and export data
Work closely with embedding-based systems to iterate between algorithmic selection and human review
Define best practices for embedding queries, relevance evaluation, and content diversity
Push for operational excellence and quality assurance at every stage
Qualification
Required
4–7 years of experience in data science, media analytics, or technical curation roles
Strong proficiency in SQL; you're comfortable writing complex queries to slice large datasets and generate insights
Comfortable working with media metadata, embeddings, and unstructured content
Experience collaborating with sales, account management, or customer success teams on technically nuanced deliverables
Strong analytical instincts — you enjoy exploring data, pattern-matching, and translating findings into action
Detail-oriented with a high standard for data quality and usability
Excellent communicator who can navigate between technical depth and customer-friendly clarity
Thrives in ambiguous, fast-moving environments with a mix of structure and creativity
You treat those around you with kindness
Preferred
Familiarity with video/audio processing, embeddings, or multimodal AI workflows
Prior experience curating or packaging datasets for machine learning
Background in content analysis, recommendation systems, or information retrieval
Benefits
Competitive compensation
Equity
Benefits package
Company
Protege
Protege is the AI training data platform enabling seamless and compliant data exchange.
Funding
Current Stage
Early StageTotal Funding
$65MKey Investors
Andreessen HorowitzFootworkCRV
2026-01-07Series A· $30M
2025-08-13Series A· $25M
2024-09-10Seed· $10M
Recent News
2026-01-11
2026-01-09
Company data provided by crunchbase