Scale AI · 1 day ago
Staff Machine Learning Research Scientist, LLM Evals
Scale AI is a leading data and evaluation partner for frontier AI companies, dedicated to advancing the evaluation and benchmarking of large language models. As a Staff Machine Learning Research Scientist on the LLM Evals team, you will lead the development of novel evaluation methodologies and benchmarks to measure the capabilities of frontier LLMs, driving research that informs both the internal roadmap and the broader research community.
AI InfrastructureArtificial Intelligence (AI)Data Collection and LabelingGenerative AIImage RecognitionMachine Learning
Responsibilities
Drive research on the effectiveness and limitations of existing LLM evaluation techniques
Design and develop novel evaluation benchmarks for large language models, covering areas such as instruction following, factuality, robustness, and fairness
Communicate, collaborate, and build relationships with clients and peer teams to facilitate cross-functional projects
Collaborate with internal teams and external partners to refine metrics and create standardized evaluation protocols
Implement scalable and reproducible evaluation pipelines using modern ML frameworks
Publish research findings in top-tier AI conferences and contribute to open-source benchmarking initiatives
Mentor and guide research scientists and engineers, providing technical leadership across cross-functional projects
Stay deeply engaged with the ML research community, tracking emerging work and contributing to the advancement of LLM evaluation science
Thrive in a high-energy, fast-paced startup environment and are ready to dedicate the time and effort needed to drive impactful results
Qualification
Required
Drive research on the effectiveness and limitations of existing LLM evaluation techniques
Design and develop novel evaluation benchmarks for large language models, covering areas such as instruction following, factuality, robustness, and fairness
Communicate, collaborate, and build relationships with clients and peer teams to facilitate cross-functional projects
Collaborate with internal teams and external partners to refine metrics and create standardized evaluation protocols
Implement scalable and reproducible evaluation pipelines using modern ML frameworks
Publish research findings in top-tier AI conferences and contribute to open-source benchmarking initiatives
Mentor and guide research scientists and engineers, providing technical leadership across cross-functional projects
Stay deeply engaged with the ML research community, tracking emerging work and contributing to the advancement of LLM evaluation science
Thrive in a high-energy, fast-paced startup environment and are ready to dedicate the time and effort needed to drive impactful results
Preferred
5+ years of hands-on experience in large language model, NLP, and Transformer modeling, in the setting of both research and engineering development
Experience and track of recording in landing major research impacts in a fast-paced environment
Experience tech leading a team of research scientists and research engineers
Excellent written and verbal communication skills
Published research in areas of machine learning at major conferences (NeurIPS, ICML, ICLR, ACL, EMNLP, CVPR, etc.) and/or journals
Previous experience in a customer facing role
Benefits
Comprehensive health, dental and vision coverage
Retirement benefits
A learning and development stipend
Generous PTO
Commuter stipend
Company
Scale AI
Scale’s mission is to develop reliable AI systems for the world’s most important decisions.
H1B Sponsorship
Scale AI has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (82)
2024 (54)
2023 (29)
2022 (17)
2021 (10)
2020 (10)
Funding
Current Stage
Late StageTotal Funding
$15.9BKey Investors
MetaAccelTiger Global Management
2025-06-10Corporate Round· $14.3B
2025-06-04Series Unknown
2024-05-21Series F· $1B
Recent News
CB Insights
2026-01-09
Crunchbase News
2026-01-07
Company data provided by crunchbase