Anthropic · 2 weeks ago
Senior Research Scientist, Reward Models
Anthropic is a public benefit corporation focused on creating reliable, interpretable, and steerable AI systems. The Senior Research Scientist on the Reward Models team will lead research efforts to enhance the understanding and optimization of human preferences in AI, collaborating across teams to drive improvements in model capabilities and safety.
Artificial Intelligence (AI)Foundational AIGenerative AIInformation TechnologyMachine Learning
Responsibilities
Lead research on novel reward model architectures and training approaches for RLHF
Develop and evaluate LLM-based grading and evaluation methods, including rubric-driven approaches that improve consistency and interpretability
Research techniques to detect, characterize, and mitigate reward hacking and specification gaming
Design experiments to understand reward model generalization, robustness, and failure modes
Collaborate with the Finetuning team to translate research insights into improvements for production training pipelines
Contribute to research publications, blog posts, and internal documentation
Mentor other researchers and help build institutional knowledge around reward modeling
Qualification
Required
Have a track record of research contributions in reward modeling, RLHF, or closely related areas of machine learning
Have experience training and evaluating reward models for large language models
Are comfortable designing and running large-scale experiments with significant computational resources
Can work effectively across research and engineering, iterating quickly while maintaining scientific rigor
Enjoy collaborative research and can communicate complex ideas clearly to diverse audiences
Care deeply about building AI systems that are both highly capable and safe
Education requirements: We require at least a Bachelor's degree in a related field or equivalent experience
Preferred
Have published research on reward modeling, preference learning, or RLHF
Have experience with LLM-as-judge approaches, including calibration and reliability challenges
Have worked on reward hacking, specification gaming, or related robustness problems
Have experience with constitutional AI, debate, or other scalable oversight approaches
Have contributed to production ML systems at scale
Have familiarity with interpretability techniques as applied to understanding reward model behavior
Benefits
Optional equity donation matching
Generous vacation and parental leave
Flexible working hours
A lovely office space in which to collaborate with colleagues
Company
Anthropic
Anthropic is an AI research company that focuses on the safety and alignment of AI systems with human values.
H1B Sponsorship
Anthropic has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (105)
2024 (13)
2023 (3)
2022 (4)
2021 (1)
Funding
Current Stage
Late StageTotal Funding
$33.74BKey Investors
Lightspeed Venture PartnersGoogleAmazon
2025-09-02Series F· $13B
2025-05-16Debt Financing· $2.5B
2025-03-03Series E· $3.5B
Recent News
2026-01-25
iphoneincanada.ca
2026-01-25
2026-01-25
Company data provided by crunchbase