Anthropic · 2 days ago
Research Engineer, Pretraining Scaling
Anthropic is a public benefit corporation focused on creating reliable and beneficial AI systems. The Research Engineer on the ML Performance and Scaling team will ensure the efficient training of production pretrained models, involving responsibilities such as performance optimization, debugging, and collaboration across teams.
Artificial Intelligence (AI)Foundational AIGenerative AIInformation TechnologyMachine Learning
Responsibilities
Own critical aspects of our production pretraining pipeline, including model operations, performance optimization, observability, and reliability
Debug and resolve complex issues across the full stack—from hardware errors and networking to training dynamics and evaluation infrastructure
Design and run experiments to improve training efficiency, reduce step time, increase uptime, and enhance model performance
Respond to on-call incidents during model launches, diagnosing problems quickly and coordinating solutions across teams
Build and maintain production logging, monitoring dashboards, and evaluation infrastructure
Add new capabilities to the training codebase, such as long context support or novel architectures
Collaborate closely with teammates across SF and London, as well as with Tokens, Architectures, and Systems teams
Contribute to the team's institutional knowledge by documenting systems, debugging approaches, and lessons learned
Qualification
Required
At least a Bachelor's degree in a related field or equivalent experience
Hands-on experience training large language models, or deep expertise with JAX, TPU, PyTorch, or large-scale distributed systems
Enjoy both research and engineering work—ideal split as roughly 50/50
Excited about being on-call for production systems, working long days during launches, and solving hard problems under pressure
Thrive when working on whatever is most impactful, even if that changes day-to-day based on what the production model needs
Excel at debugging complex, ambiguous problems across multiple layers of the stack
Communicate clearly and collaborate effectively, especially when coordinating across time zones or during high-stress incidents
Passionate about the work itself and want to refine your craft as a research engineer
Care about the societal impacts of AI and responsible scaling
Preferred
Previous experience training LLM's or working extensively with JAX/TPU, PyTorch, or other ML frameworks at scale
Contributed to open-source LLM frameworks (e.g., open_lm, llm-foundry, mesh-transformer-jax)
Published research on model training, scaling laws, or ML systems
Experience with production ML systems, observability tools, or evaluation infrastructure
Background as a systems engineer, quant, or in other roles requiring both technical depth and operational excellence
Benefits
Equity and benefits
Generous vacation and parental leave
Flexible working hours
Company
Anthropic
Anthropic is an AI research company that focuses on the safety and alignment of AI systems with human values.
H1B Sponsorship
Anthropic has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (105)
2024 (13)
2023 (3)
2022 (4)
2021 (1)
Funding
Current Stage
Late StageTotal Funding
$33.74BKey Investors
Lightspeed Venture PartnersGoogleAmazon
2025-09-02Series F· $13B
2025-05-16Debt Financing· $2.5B
2025-03-03Series E· $3.5B
Recent News
StreetInsider.com
2026-01-18
2026-01-18
2026-01-18
Company data provided by crunchbase