Apply on Employer Site

Reflection AI · 21 hours ago

Member of Technical Staff - Data Quality Engineer (Pre-training)

United States

Full-time

Remote

Mid Level

Reflection AI is on a mission to build open superintelligence and make it accessible to all. They are seeking a Data Quality Engineer to ensure that the data used to train their models meets high standards for quality and reliability, directly impacting model performance.

Computer Software

H1B Sponsor Likely

Responsibilities

Own upstream data quality for LLM pre-training; as a specialist or generalist across languages and modalities

Partner closely with research and pre-training teams to translate requirements into measurable quality signals, and provide actionable feedback to external data vendors

In addition to human-in-the-loop processes, you will design, validate, and scale automated QA methods to reliably measure data quality across large campaigns

Build reusable QA pipelines that reliably deliver high-quality data to pre-training teams for model training

Monitor and report on data quality over time, driving continuous iteration on quality standards, processes, and acceptance criteria

Qualification

PythonBuilding ML workflowsAutomated QA methodsLarge datasetsLLM familiarityAnalytical mindsetCommunicationDetail-oriented

Required

Strong engineering fundamentals with experience building data pipelines, QA systems, or evaluation workflows for pre-training data

Detail-oriented with an analytical mindset, able to identify failure modes, inconsistencies, and subtle issues that affect data quality

Solid understanding of how data quality impacts pre-training, with the ability to translate quality concerns into concrete signals, decisions, and feedback

Experience designing and validating automated quality checks, including rule-based systems, statistical methods, or model-assisted approaches such as LLM-as-a-Judge

Comfortable working autonomously, owning problems end-to-end, and collaborating effectively with researchers, engineers, and operations partners

Proficiency in Python and building ML / LLM workflows. Must be comfortable debugging and writing scalable code

Experience working with large datasets and automated evaluation or quality-checking systems

Familiarity with how LLMs work and can describe how models are trained and evaluated

Excellent communication skills with the ability to clearly articulate complex technical concepts across teams

Benefits

Comprehensive medical, dental, vision, life, and disability insurance.

Fully paid parental leave for all new parents, including adoptive and surrogate journeys.

Financial support for family planning.

Paid time off when you need it, relocation support, and more perks that optimize your time.

Lunch and dinner are provided daily.

Regular off-sites and team celebrations.

Company

Reflection AI

Frontier open intelligence accessible to all. Our team previously built frontier LLMs at labs like DeepMind, OpenAI, and Anthropic.

San Francisco, California, US

11-50 employees

https://www.reflection.ai/

H1B Sponsorship

Reflection AI has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (5)

Funding

Current Stage

Early Stage

Company data provided by crunchbase