Apply on Employer Site

Scale AI · 10 hours ago

Staff Machine Learning Research Scientist, LLM Evals

Seattle, WA

Full-time

Onsite

Senior Level, Lead/Staff

$280K/yr - $380K/yr

5+ years exp

Scale AI is a leading data and evaluation partner for frontier AI companies, dedicated to advancing the evaluation and benchmarking of large language models. As a Staff Machine Learning Research Scientist on the LLM Evals team, you will lead the development of novel evaluation methodologies and benchmarks to measure the capabilities of frontier LLMs, driving research that informs both the internal roadmap and the broader research community.

AI InfrastructureArtificial Intelligence (AI)Data Collection and LabelingGenerative AIImage RecognitionMachine Learning

H1B Sponsor Likely

Responsibilities

Drive research on the effectiveness and limitations of existing LLM evaluation techniques

Design and develop novel evaluation benchmarks for large language models, covering areas such as instruction following, factuality, robustness, and fairness

Communicate, collaborate, and build relationships with clients and peer teams to facilitate cross-functional projects

Collaborate with internal teams and external partners to refine metrics and create standardized evaluation protocols

Implement scalable and reproducible evaluation pipelines using modern ML frameworks

Publish research findings in top-tier AI conferences and contribute to open-source benchmarking initiatives

Mentor and guide research scientists and engineers, providing technical leadership across cross-functional projects

Stay deeply engaged with the ML research community, tracking emerging work and contributing to the advancement of LLM evaluation science

Thrive in a high-energy, fast-paced startup environment and are ready to dedicate the time and effort needed to drive impactful results

Qualification

Large Language ModelsNLPTransformer ModelingEvaluation MethodologiesResearch PublicationTechnical LeadershipCommunication SkillsMentoringCollaboration

Required

Drive research on the effectiveness and limitations of existing LLM evaluation techniques

Design and develop novel evaluation benchmarks for large language models, covering areas such as instruction following, factuality, robustness, and fairness

Communicate, collaborate, and build relationships with clients and peer teams to facilitate cross-functional projects

Collaborate with internal teams and external partners to refine metrics and create standardized evaluation protocols

Implement scalable and reproducible evaluation pipelines using modern ML frameworks

Publish research findings in top-tier AI conferences and contribute to open-source benchmarking initiatives

Mentor and guide research scientists and engineers, providing technical leadership across cross-functional projects

Stay deeply engaged with the ML research community, tracking emerging work and contributing to the advancement of LLM evaluation science

Thrive in a high-energy, fast-paced startup environment and are ready to dedicate the time and effort needed to drive impactful results

Preferred

5+ years of hands-on experience in large language model, NLP, and Transformer modeling, in the setting of both research and engineering development

Experience and track of recording in landing major research impacts in a fast-paced environment

Experience tech leading a team of research scientists and research engineers

Excellent written and verbal communication skills

Published research in areas of machine learning at major conferences (NeurIPS, ICML, ICLR, ACL, EMNLP, CVPR, etc.) and/or journals

Previous experience in a customer facing role

Benefits

Comprehensive health, dental and vision coverage

Retirement benefits

A learning and development stipend

Generous PTO

Commuter stipend

Company

Scale AI

Scale’s mission is to develop reliable AI systems for the world’s most important decisions.

Founded in 2016

San Francisco, California, USA

501-1000 employees

https://scale.com

H1B Sponsorship

Scale AI has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (82)

2024 (54)

2023 (29)

2022 (17)

2021 (10)

2020 (10)

Funding

Current Stage

Late Stage

Total Funding

$15.9B

Key Investors

MetaAccelTiger Global Management

2025-06-10Corporate Round· $14.3B

2025-06-04Series Unknown

2024-05-21Series F· $1B

Leadership Team

Jason Droege

Interim Chief Executive Officer

Dennis Cinelli

Chief Financial Officer

Recent News

CB Insights

State of Venture 2025

2026-01-09

Crunchbase News

Global Venture Funding In 2025 Surged As Startup Deals And Valuations Set All-Time Records

2026-01-07

Benzinga.com

Former Meta Scientist Says Mark Zuckerberg's New AI Chief Is 'Young' And 'Inexperienced'—Warns 'Lot Of People' Who Haven't Yet Left Meta 'Will Leave'

2026-01-05

Company data provided by crunchbase