Apply on Employer Site

Anthropic · 2 weeks ago

Senior Research Scientist, Reward Models

United States

Full-time

Remote

Senior Level

$350K/yr - $500K/yr

Anthropic is a public benefit corporation focused on creating reliable, interpretable, and steerable AI systems. The Senior Research Scientist on the Reward Models team will lead research efforts to enhance the understanding and optimization of human preferences in AI, collaborating across teams to drive improvements in model capabilities and safety.

Artificial Intelligence (AI)Foundational AIGenerative AIInformation TechnologyMachine Learning

H1B Sponsored

Responsibilities

Lead research on novel reward model architectures and training approaches for RLHF

Develop and evaluate LLM-based grading and evaluation methods, including rubric-driven approaches that improve consistency and interpretability

Research techniques to detect, characterize, and mitigate reward hacking and specification gaming

Design experiments to understand reward model generalization, robustness, and failure modes

Collaborate with the Finetuning team to translate research insights into improvements for production training pipelines

Contribute to research publications, blog posts, and internal documentation

Mentor other researchers and help build institutional knowledge around reward modeling

Qualification

Reward modelingRLHFLarge language modelsExperimental designCollaborative researchInterpretability techniquesCommunication skillsMentoring

Required

Have a track record of research contributions in reward modeling, RLHF, or closely related areas of machine learning

Have experience training and evaluating reward models for large language models

Are comfortable designing and running large-scale experiments with significant computational resources

Can work effectively across research and engineering, iterating quickly while maintaining scientific rigor

Enjoy collaborative research and can communicate complex ideas clearly to diverse audiences

Care deeply about building AI systems that are both highly capable and safe

Education requirements: We require at least a Bachelor's degree in a related field or equivalent experience

Preferred

Have published research on reward modeling, preference learning, or RLHF

Have experience with LLM-as-judge approaches, including calibration and reliability challenges

Have worked on reward hacking, specification gaming, or related robustness problems

Have experience with constitutional AI, debate, or other scalable oversight approaches

Have contributed to production ML systems at scale

Have familiarity with interpretability techniques as applied to understanding reward model behavior

Benefits

Optional equity donation matching

Generous vacation and parental leave

Flexible working hours

A lovely office space in which to collaborate with colleagues

Company

Anthropic

Anthropic is an AI research company that focuses on the safety and alignment of AI systems with human values.

Founded in 2021

San Francisco, California, USA

501-1000 employees

https://www.anthropic.com

H1B Sponsorship

Anthropic has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (105)

2024 (13)

2023 (3)

2022 (4)

2021 (1)

Funding

Current Stage

Late Stage

Total Funding

$33.74B

Key Investors

Lightspeed Venture PartnersGoogleAmazon

2025-09-02Series F· $13B

2025-05-16Debt Financing· $2.5B

2025-03-03Series E· $3.5B

Leadership Team

Dario Amodei

Co-Founder and CEO

Daniela Amodei

President and co-founder

Recent News

Indian Express

This AI tool is going viral. Five ways people are using it.

2026-01-25

iphoneincanada.ca

Why Your Childhood Pokémon Games Are the Newest Test for Advanced AI

2026-01-25

The New Stack

How Claude Cowork helps developers spread the AI knowledge

2026-01-25

Company data provided by crunchbase