Senior Research Scientist, Reward Models jobs in United States
cer-icon
Apply on Employer Site
company-logo

Anthropic · 1 day ago

Senior Research Scientist, Reward Models

Anthropic is a public benefit corporation dedicated to creating reliable and beneficial AI systems. They are seeking a Senior Research Scientist to lead research on improving how human preferences are specified and learned at scale, focusing on reward modeling for large language models.

Artificial Intelligence (AI)Foundational AIGenerative AIInformation TechnologyMachine Learning
check
H1B Sponsorednote

Responsibilities

Lead research on novel reward model architectures and training approaches for RLHF
Develop and evaluate LLM-based grading and evaluation methods, including rubric-driven approaches that improve consistency and interpretability
Research techniques to detect, characterize, and mitigate reward hacking and specification gaming
Design experiments to understand reward model generalization, robustness, and failure modes
Collaborate with the Finetuning team to translate research insights into improvements for production training pipelines
Contribute to research publications, blog posts, and internal documentation
Mentor other researchers and help build institutional knowledge around reward modeling

Qualification

Reward modelingRLHFLarge language modelsExperiment designCollaborative researchInterpretability techniquesCommunication skillsMentoring

Required

Have a track record of research contributions in reward modeling, RLHF, or closely related areas of machine learning
Have experience training and evaluating reward models for large language models
Are comfortable designing and running large-scale experiments with significant computational resources
Can work effectively across research and engineering, iterating quickly while maintaining scientific rigor
Enjoy collaborative research and can communicate complex ideas clearly to diverse audiences
Care deeply about building AI systems that are both highly capable and safe
Education requirements: We require at least a Bachelor's degree in a related field or equivalent experience

Preferred

Have published research on reward modeling, preference learning, or RLHF
Have experience with LLM-as-judge approaches, including calibration and reliability challenges
Have worked on reward hacking, specification gaming, or related robustness problems
Have experience with constitutional AI, debate, or other scalable oversight approaches
Have contributed to production ML systems at scale
Have familiarity with interpretability techniques as applied to understanding reward model behavior

Benefits

Equity
Benefits
Incentive compensation
Optional equity donation matching
Generous vacation and parental leave
Flexible working hours

Company

Anthropic

twittertwittertwitter
company-logo
Anthropic is an AI research company that focuses on the safety and alignment of AI systems with human values.

H1B Sponsorship

Anthropic has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (105)
2024 (13)
2023 (3)
2022 (4)
2021 (1)

Funding

Current Stage
Late Stage
Total Funding
$33.74B
Key Investors
Lightspeed Venture PartnersGoogleAmazon
2025-09-02Series F· $13B
2025-05-16Debt Financing· $2.5B
2025-03-03Series E· $3.5B

Leadership Team

leader-logo
Dario Amodei
CEO & Co-Founder
linkedin
leader-logo
Daniela Amodei
President and co-founder
linkedin
Company data provided by crunchbase