Research Engineer, Reward Models Training jobs in United States
cer-icon
Apply on Employer Site
company-logo

Anthropic · 16 hours ago

Research Engineer, Reward Models Training

Anthropic is a public benefit corporation dedicated to creating reliable AI systems. The Research Engineer will build infrastructure for training reward models, collaborating with researchers to enhance AI alignment with human values.

Artificial Intelligence (AI)Foundational AIGenerative AIInformation TechnologyMachine Learning
check
H1B Sponsorednote

Responsibilities

Own the end-to-end engineering of reward model training, from data ingestion through model evaluation and deployment
Design and implement efficient, reliable training pipelines that can scale to increasingly large model sizes
Build robust data pipelines for collecting, processing, and incorporating human feedback into reward model training
Optimize training infrastructure for throughput, efficiency, and fault tolerance across distributed systems
Extend reward model capabilities to support new domains and additional data modalities
Collaborate with researchers to implement and iterate on novel reward modeling techniques
Develop tooling and monitoring systems to ensure training quality and identify issues early
Contribute to the design and improvement of our overall model training infrastructure

Qualification

Large-scale ML systemsPythonPyTorchDistributed training systemsData pipelinesReinforcement learningCloud infrastructureCollaborationProblem-solvingAdaptability

Required

Have significant experience building and maintaining large-scale ML systems
Are proficient in Python and have experience with ML frameworks such as PyTorch
Have experience with distributed training systems and optimizing ML workloads for efficiency
Are comfortable working with large datasets and building data pipelines at scale
Can balance research exploration with engineering rigor and operational reliability
Enjoy collaborating closely with researchers and translating research ideas into reliable engineering systems
Are results-oriented with a bias towards flexibility and impact
Can navigate ambiguity and make progress in fast-moving research environments
Adapt quickly to changing priorities, while juggling multiple urgent issues
Maintain clarity when debugging complex, time-sensitive issues
Pick up slack, even if it goes outside your job description
Care about the societal impacts of your work and are motivated by Anthropic's mission
Education requirements: We require at least a Bachelor's degree in a related field or equivalent experience

Preferred

Training or fine-tuning large language models
Reinforcement learning from human feedback (RLHF) or related techniques
GPUs, Kubernetes, and cloud infrastructure (AWS, GCP)
Building systems for human-in-the-loop machine learning
Working with multimodal data (text, images, audio, etc.)
Large-scale ETL and data processing frameworks (Spark, Airflow)

Benefits

Equity
Benefits
Incentive compensation
Generous vacation and parental leave
Flexible working hours

Company

Anthropic

twittertwittertwitter
company-logo
Anthropic is an AI research company that focuses on the safety and alignment of AI systems with human values.

H1B Sponsorship

Anthropic has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (105)
2024 (13)
2023 (3)
2022 (4)
2021 (1)

Funding

Current Stage
Late Stage
Total Funding
$33.74B
Key Investors
Lightspeed Venture PartnersGoogleAmazon
2025-09-02Series F· $13B
2025-05-16Debt Financing· $2.5B
2025-03-03Series E· $3.5B

Leadership Team

leader-logo
Dario Amodei
CEO & Co-Founder
linkedin
leader-logo
Daniela Amodei
President and co-founder
linkedin
Company data provided by crunchbase