Apply on Employer Site

Anthropic · 16 hours ago

Research Engineer, Reward Models Training

New York, NY

Full-time

Hybrid

Senior Level

$350K/yr - $500K/yr

5+ years exp

Anthropic is a public benefit corporation dedicated to creating reliable AI systems. The Research Engineer will build infrastructure for training reward models, collaborating with researchers to enhance AI alignment with human values.

Artificial Intelligence (AI)Foundational AIGenerative AIInformation TechnologyMachine Learning

H1B Sponsored

Responsibilities

Own the end-to-end engineering of reward model training, from data ingestion through model evaluation and deployment

Design and implement efficient, reliable training pipelines that can scale to increasingly large model sizes

Build robust data pipelines for collecting, processing, and incorporating human feedback into reward model training

Optimize training infrastructure for throughput, efficiency, and fault tolerance across distributed systems

Extend reward model capabilities to support new domains and additional data modalities

Collaborate with researchers to implement and iterate on novel reward modeling techniques

Develop tooling and monitoring systems to ensure training quality and identify issues early

Contribute to the design and improvement of our overall model training infrastructure

Qualification

Large-scale ML systemsPythonPyTorchDistributed training systemsData pipelinesReinforcement learningCloud infrastructureCollaborationProblem-solvingAdaptability

Required

Have significant experience building and maintaining large-scale ML systems

Are proficient in Python and have experience with ML frameworks such as PyTorch

Have experience with distributed training systems and optimizing ML workloads for efficiency

Are comfortable working with large datasets and building data pipelines at scale

Can balance research exploration with engineering rigor and operational reliability

Enjoy collaborating closely with researchers and translating research ideas into reliable engineering systems

Are results-oriented with a bias towards flexibility and impact

Can navigate ambiguity and make progress in fast-moving research environments

Adapt quickly to changing priorities, while juggling multiple urgent issues

Maintain clarity when debugging complex, time-sensitive issues

Pick up slack, even if it goes outside your job description

Care about the societal impacts of your work and are motivated by Anthropic's mission

Education requirements: We require at least a Bachelor's degree in a related field or equivalent experience

Preferred

Training or fine-tuning large language models

Reinforcement learning from human feedback (RLHF) or related techniques

GPUs, Kubernetes, and cloud infrastructure (AWS, GCP)

Building systems for human-in-the-loop machine learning

Working with multimodal data (text, images, audio, etc.)

Large-scale ETL and data processing frameworks (Spark, Airflow)

Benefits

Equity

Benefits

Incentive compensation

Generous vacation and parental leave

Flexible working hours

Company

Anthropic

Anthropic is an AI research company that focuses on the safety and alignment of AI systems with human values.

Founded in 2021

San Francisco, California, USA

501-1000 employees

https://www.anthropic.com

H1B Sponsorship

Anthropic has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (105)

2024 (13)

2023 (3)

2022 (4)

2021 (1)

Funding

Current Stage

Late Stage

Total Funding

$33.74B

Key Investors

Lightspeed Venture PartnersGoogleAmazon

2025-09-02Series F· $13B

2025-05-16Debt Financing· $2.5B

2025-03-03Series E· $3.5B

Leadership Team

Dario Amodei

CEO & Co-Founder

Daniela Amodei

President and co-founder

Recent News

Venture Capital

Anthropic Plans to Raise $10 Billion at $350 Billion Valuation

2026-01-09

IndiaTimes

Anthropic President Daniela Amodei calls the AI concept that OpenAI and other companies are pouring billions in 'outdated'

2026-01-08

IndiaTimes

CEO Jensen Huang confirms, Nvidia’s $500 billion AI demand outlook won't …

2026-01-08

Company data provided by crunchbase