Apply on Employer Site

Scale AI · 1 day ago

AI Research Engineer, Enterprise Evaluations

San Francisco Bay Area

Full-time

Hybrid

Entry, Mid Level

$179K/yr - $224K/yr

2+ years exp

Scale AI is seeking a technically rigorous and driven AI Research Engineer to join their Enterprise Evaluations team. This high-impact role focuses on developing and maintaining AI evaluation systems that ensure safety and reliability in LLM-powered workflows for enterprise clients.

AI InfrastructureArtificial Intelligence (AI)Data Collection and LabelingGenerative AIImage RecognitionMachine Learning

H1B Sponsor Likely

Responsibilities

Partner with Scale’s Operations team and enterprise customers to translate ambiguity into structured evaluation data, guiding the creation and maintenance of gold-standard human-rated datasets and expert rubrics that anchor AI evaluation systems

Analyze feedback and collected data to identify patterns, refine evaluation frameworks, and establish iterative improvement loops that enhance the quality and relevance of human-curated assessments

Design, research, and develop LLM-as-a-Judge autorater frameworks and AI-assisted evaluation systems. This includes creating models that critique, grade, and explain agent outputs (e.g., RLAIF, model-judging-model setups), along with scalable evaluation pipelines and diagnostic tools

Pursue research initiatives that explore new methodologies for automatically analyzing, evaluating, and improving the behavior of enterprise agents, pushing the boundaries of how AI systems are assessed and optimized in real-world contexts

Qualification

Large Language ModelsMachine LearningGenerative AIPythonML frameworksStatistical analysisModel evaluation methodologiesResearch skillsCollaborationProblem-solving

Required

Bachelor's degree in Computer Science, Electrical Engineering, a related field, or equivalent practical experience

2+ years of experience in Machine Learning or Applied Research, focused on applied ML systems or evaluation infrastructure

Hands-on experience with Large Language Models (LLMs) and Generative AI in professional or research environments

Strong understanding of frontier model evaluation methodologies and the current research landscape

Proficiency in Python and major ML frameworks (e.g., PyTorch, TensorFlow)

Solid engineering and statistical analysis foundation, with experience developing data-driven methods for assessing model quality

Preferred

Advanced degree (Master's or Ph.D.) in Computer Science, Machine Learning, or a related quantitative field

Published research in leading ML or AI conferences such as NeurIPS, ICML, ICLR, or KDD

Experience designing, building, or deploying LLM-as-a-Judge frameworks or other automated evaluation systems for complex models

Experience collaborating with operations or external teams to define high-quality human annotator guidelines

Expertise in ML research engineering, stochastic systems, observability, or LLM-powered applications for model evaluation and analysis

Experience contributing to scalable pipelines that automate the evaluation and monitoring of large-scale models and agents

Familiarity with distributed computing frameworks and modern cloud infrastructure

Benefits

Comprehensive health, dental and vision coverage

Retirement benefits

A learning and development stipend

Generous PTO

Commuter stipend

Company

Scale AI

Scale’s mission is to develop reliable AI systems for the world’s most important decisions.

Founded in 2016

San Francisco, California, USA

501-1000 employees

https://scale.com

H1B Sponsorship

Scale AI has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (82)

2024 (54)

2023 (29)

2022 (17)

2021 (10)

2020 (10)

Funding

Current Stage

Late Stage

Total Funding

$15.9B

Key Investors

MetaAccelDragoneer Investment Group,Greenoaks,Tiger Global Management

2025-06-10Corporate Round· $14.3B

2025-06-04Series Unknown

2024-05-21Series F· $1B

Leadership Team

Jason Droege

Interim Chief Executive Officer

Clemens Viernickel

Head of Product

Recent News

IndiaTimes

Senator Elizabeth Warren, 2 others send letter to FTC and DOJ: Inspect AI deals from Nvidia, Meta and Google for…

2026-02-05

PR Newswire

Raft Launches Partner Program to Accelerate Department of War Modernization

2026-01-16

CB Insights

State of Venture 2025

2026-01-09

Company data provided by crunchbase