Apply on Employer Site

Anthropic · 9 hours ago

Software Engineer, AI Reliability

New York, NY

Full-time

Hybrid

Mid, Senior Level

$325K/yr - $485K/yr

Anthropic is a public benefit corporation focused on creating reliable, interpretable, and steerable AI systems. They are seeking a Software Engineer in AI Reliability to improve system reliability across critical serving paths and collaborate with various teams to enhance the robustness of their AI services.

Artificial Intelligence (AI)Foundational AIGenerative AIInformation TechnologyMachine Learning

H1B Sponsored

Responsibilities

Develop appropriate Service Level Objectives for large language model serving systems, balancing availability and latency with development velocity

Design and implement monitoring and observability systems across the token path

Assist in the design and implementation of high-availability serving infrastructure across multiple regions and cloud providers

Lead incident response for critical AI services, ensuring rapid recovery, thorough incident reviews, and systematic improvements

Support the reliability of safeguard model serving -- critical for both site reliability and Anthropic's safety commitments

Qualification

Distributed systemsReliability engineeringLarge-scale infrastructureML hardware acceleratorsAI observability toolsChaos engineeringCommunication skillsCollaboration skills

Required

Bachelor's degree in a related field or equivalent experience

Strong distributed systems, infrastructure, or reliability backgrounds

Curious and brave -- comfortable jumping into unfamiliar systems during an incident and helping drive resolution even when you don't have deep expertise yet

Think holistically about how systems compose and where the seams are

Can build lasting relationships across teams

Care about users and feel ownership over outcomes, even for systems you don't own

Excellent communication and collaboration skills

Diverse experience in building product stacks, scaling databases, running massive distributed systems, and everything in between

Preferred

Experience as an SRE, Production Engineer, or in similar reliability-focused roles on large scale systems

Experience operating large-scale model serving or training infrastructure (>1000 GPUs)

Experience with one or more ML hardware accelerators (GPUs, TPUs, Trainium)

Understanding of ML-specific networking optimizations like RDMA and InfiniBand

Expertise in AI-specific observability tools and frameworks

Experience with chaos engineering and systematic resilience testing

Contributed to open-source infrastructure or ML tooling

Benefits

Competitive compensation and benefits

Optional equity donation matching

Generous vacation and parental leave

Flexible working hours

A lovely office space in which to collaborate with colleagues

Company

Anthropic

Anthropic is an AI research company that focuses on the safety and alignment of AI systems with human values.

Founded in 2021

San Francisco, California, USA

501-1000 employees

https://www.anthropic.com

H1B Sponsorship

Anthropic has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (105)

2024 (13)

2023 (3)

2022 (4)

2021 (1)

Funding

Current Stage

Late Stage

Total Funding

$33.74B

Key Investors

Fidelity,ICONIQ Capital,Lightspeed Venture PartnersLightspeed Venture PartnersGoogle

2025-09-02Series F· $13B

2025-05-16Debt Financing· $2.5B

2025-03-03Series E· $3.5B

Leadership Team

Dario Amodei

Co-Founder and CEO

Daniela Amodei

President and co-founder

Recent News

Hedgeweek

Hedge funds suffer sharp losses amid AI-led tech sell-off

2026-02-07

Inc42 Media

Anthropic Sparks SaaSpocalypse, Deeptechs Get A Booster Shot & More

2026-02-07

TechCrunch

Maybe AI agents can be lawyers after all

2026-02-07

Company data provided by crunchbase