Apply on Employer Site

Anthropic · 2 days ago

Research Engineer, Pretraining Scaling

San Francisco, CA

Full-time

Onsite

Senior Level

$350K/yr - $850K/yr

Anthropic is a public benefit corporation focused on creating reliable and beneficial AI systems. The Research Engineer on the ML Performance and Scaling team will ensure the efficient training of production pretrained models, involving responsibilities such as performance optimization, debugging, and collaboration across teams.

Artificial Intelligence (AI)Foundational AIGenerative AIInformation TechnologyMachine Learning

H1B Sponsored

Responsibilities

Own critical aspects of our production pretraining pipeline, including model operations, performance optimization, observability, and reliability

Debug and resolve complex issues across the full stack—from hardware errors and networking to training dynamics and evaluation infrastructure

Design and run experiments to improve training efficiency, reduce step time, increase uptime, and enhance model performance

Respond to on-call incidents during model launches, diagnosing problems quickly and coordinating solutions across teams

Build and maintain production logging, monitoring dashboards, and evaluation infrastructure

Add new capabilities to the training codebase, such as long context support or novel architectures

Collaborate closely with teammates across SF and London, as well as with Tokens, Architectures, and Systems teams

Contribute to the team's institutional knowledge by documenting systems, debugging approaches, and lessons learned

Qualification

Large language modelsJAXTPUPyTorchDistributed systemsPerformance optimizationDebuggingExperimental designProduction ML systemsCollaborationCommunicationProblem-solving

Required

At least a Bachelor's degree in a related field or equivalent experience

Hands-on experience training large language models, or deep expertise with JAX, TPU, PyTorch, or large-scale distributed systems

Enjoy both research and engineering work—ideal split as roughly 50/50

Excited about being on-call for production systems, working long days during launches, and solving hard problems under pressure

Thrive when working on whatever is most impactful, even if that changes day-to-day based on what the production model needs

Excel at debugging complex, ambiguous problems across multiple layers of the stack

Communicate clearly and collaborate effectively, especially when coordinating across time zones or during high-stress incidents

Passionate about the work itself and want to refine your craft as a research engineer

Care about the societal impacts of AI and responsible scaling

Preferred

Previous experience training LLM's or working extensively with JAX/TPU, PyTorch, or other ML frameworks at scale

Contributed to open-source LLM frameworks (e.g., open_lm, llm-foundry, mesh-transformer-jax)

Published research on model training, scaling laws, or ML systems

Experience with production ML systems, observability tools, or evaluation infrastructure

Background as a systems engineer, quant, or in other roles requiring both technical depth and operational excellence

Benefits

Equity and benefits

Generous vacation and parental leave

Flexible working hours

Company

Anthropic

Anthropic is an AI research company that focuses on the safety and alignment of AI systems with human values.

Founded in 2021

San Francisco, California, USA

501-1000 employees

https://www.anthropic.com

H1B Sponsorship

Anthropic has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (105)

2024 (13)

2023 (3)

2022 (4)

2021 (1)

Funding

Current Stage

Late Stage

Total Funding

$33.74B

Key Investors

Lightspeed Venture PartnersGoogleAmazon

2025-09-02Series F· $13B

2025-05-16Debt Financing· $2.5B

2025-03-03Series E· $3.5B

Leadership Team

Dario Amodei

CEO & Co-Founder

Daniela Amodei

President and co-founder

Recent News

StreetInsider.com

Anthropic, OpenAI have taken early steps to go public - NYT

2026-01-18

Geeky Gadgets

From Templates to Gmail, How Claude Cowork Handles Office Tasks

2026-01-18

Geeky Gadgets

Plan, Code, and Review in Parallel with Agents in Claude Code

2026-01-18

Company data provided by crunchbase