Staff Machine Learning Research Scientist, LLM Evals jobs in United States
cer-icon
Apply on Employer Site
company-logo

Scale AI · 9 hours ago

Staff Machine Learning Research Scientist, LLM Evals

Scale AI is a leading data and evaluation partner for frontier AI companies, dedicated to advancing the evaluation and benchmarking of large language models. As a Staff Machine Learning Research Scientist on the LLM Evals team, you will lead the development of novel evaluation methodologies and benchmarks to measure the capabilities of frontier LLMs, driving research that informs both the internal roadmap and the broader research community.

AI InfrastructureArtificial Intelligence (AI)Data Collection and LabelingGenerative AIImage RecognitionMachine Learning
check
H1B Sponsor Likelynote

Responsibilities

Drive research on the effectiveness and limitations of existing LLM evaluation techniques
Design and develop novel evaluation benchmarks for large language models, covering areas such as instruction following, factuality, robustness, and fairness
Communicate, collaborate, and build relationships with clients and peer teams to facilitate cross-functional projects
Collaborate with internal teams and external partners to refine metrics and create standardized evaluation protocols
Implement scalable and reproducible evaluation pipelines using modern ML frameworks
Publish research findings in top-tier AI conferences and contribute to open-source benchmarking initiatives
Mentor and guide research scientists and engineers, providing technical leadership across cross-functional projects
Stay deeply engaged with the ML research community, tracking emerging work and contributing to the advancement of LLM evaluation science
Thrive in a high-energy, fast-paced startup environment and are ready to dedicate the time and effort needed to drive impactful results

Qualification

Large Language ModelsNLPTransformer ModelingEvaluation MethodologiesResearch PublicationTechnical LeadershipCommunication SkillsMentoringCollaboration

Required

Drive research on the effectiveness and limitations of existing LLM evaluation techniques
Design and develop novel evaluation benchmarks for large language models, covering areas such as instruction following, factuality, robustness, and fairness
Communicate, collaborate, and build relationships with clients and peer teams to facilitate cross-functional projects
Collaborate with internal teams and external partners to refine metrics and create standardized evaluation protocols
Implement scalable and reproducible evaluation pipelines using modern ML frameworks
Publish research findings in top-tier AI conferences and contribute to open-source benchmarking initiatives
Mentor and guide research scientists and engineers, providing technical leadership across cross-functional projects
Stay deeply engaged with the ML research community, tracking emerging work and contributing to the advancement of LLM evaluation science
Thrive in a high-energy, fast-paced startup environment and are ready to dedicate the time and effort needed to drive impactful results

Preferred

5+ years of hands-on experience in large language model, NLP, and Transformer modeling, in the setting of both research and engineering development
Experience and track of recording in landing major research impacts in a fast-paced environment
Experience tech leading a team of research scientists and research engineers
Excellent written and verbal communication skills
Published research in areas of machine learning at major conferences (NeurIPS, ICML, ICLR, ACL, EMNLP, CVPR, etc.) and/or journals
Previous experience in a customer facing role

Benefits

Comprehensive health, dental and vision coverage
Retirement benefits
A learning and development stipend
Generous PTO
Commuter stipend

Company

Scale AI

twittertwittertwitter
company-logo
Scale’s mission is to develop reliable AI systems for the world’s most important decisions.

H1B Sponsorship

Scale AI has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (82)
2024 (54)
2023 (29)
2022 (17)
2021 (10)
2020 (10)

Funding

Current Stage
Late Stage
Total Funding
$15.9B
Key Investors
MetaAccelTiger Global Management
2025-06-10Corporate Round· $14.3B
2025-06-04Series Unknown
2024-05-21Series F· $1B

Leadership Team

leader-logo
Jason Droege
Interim Chief Executive Officer
linkedin
leader-logo
Dennis Cinelli
Chief Financial Officer
linkedin
Company data provided by crunchbase