Anthropic · 8 hours ago
Software Engineer, AI Reliability
Anthropic is a public benefit corporation focused on creating reliable, interpretable, and steerable AI systems. They are seeking a Software Engineer in AI Reliability to improve system reliability across critical serving paths and collaborate with various teams to enhance the robustness of their AI services.
Artificial Intelligence (AI)Foundational AIGenerative AIInformation TechnologyMachine Learning
Responsibilities
Develop appropriate Service Level Objectives for large language model serving systems, balancing availability and latency with development velocity
Design and implement monitoring and observability systems across the token path
Assist in the design and implementation of high-availability serving infrastructure across multiple regions and cloud providers
Lead incident response for critical AI services, ensuring rapid recovery, thorough incident reviews, and systematic improvements
Support the reliability of safeguard model serving -- critical for both site reliability and Anthropic's safety commitments
Qualification
Required
Bachelor's degree in a related field or equivalent experience
Strong distributed systems, infrastructure, or reliability backgrounds
Curious and brave -- comfortable jumping into unfamiliar systems during an incident and helping drive resolution even when you don't have deep expertise yet
Think holistically about how systems compose and where the seams are
Can build lasting relationships across teams
Care about users and feel ownership over outcomes, even for systems you don't own
Excellent communication and collaboration skills
Diverse experience in building product stacks, scaling databases, running massive distributed systems, and everything in between
Preferred
Experience as an SRE, Production Engineer, or in similar reliability-focused roles on large scale systems
Experience operating large-scale model serving or training infrastructure (>1000 GPUs)
Experience with one or more ML hardware accelerators (GPUs, TPUs, Trainium)
Understanding of ML-specific networking optimizations like RDMA and InfiniBand
Expertise in AI-specific observability tools and frameworks
Experience with chaos engineering and systematic resilience testing
Contributed to open-source infrastructure or ML tooling
Benefits
Competitive compensation and benefits
Optional equity donation matching
Generous vacation and parental leave
Flexible working hours
A lovely office space in which to collaborate with colleagues
Company
Anthropic
Anthropic is an AI research company that focuses on the safety and alignment of AI systems with human values.
H1B Sponsorship
Anthropic has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (105)
2024 (13)
2023 (3)
2022 (4)
2021 (1)
Funding
Current Stage
Late StageTotal Funding
$33.74BKey Investors
Fidelity,ICONIQ Capital,Lightspeed Venture PartnersLightspeed Venture PartnersGoogle
2025-09-02Series F· $13B
2025-05-16Debt Financing· $2.5B
2025-03-03Series E· $3.5B
Recent News
2026-02-07
2026-02-07
TechCrunch
2026-02-07
Company data provided by crunchbase