Apply on Employer Site

Reflection AI · 1 week ago

Member of Technical Staff - Safety Lead

San Francisco, CA

Full-time

Onsite

Senior Level

Reflection AI is on a mission to build open superintelligence and make it accessible to all. They are seeking a Safety Lead to own the adversarial evaluation pipeline for their models, ensuring safety and reliability in deployment through collaboration with the Alignment team and the development of automated safety benchmarks.

Computer Software

H1B Sponsor Likely

Responsibilities

Own the red-teaming and adversarial evaluation pipeline for Reflection’s models, continuously probing for failure modes across security, misuse, and alignment gaps

Work hand-in-hand with the Alignment team to translate safety findings into concrete guardrails, ensuring models behave reliably under stress and adhere to deployment policies

Validate that every release meets the lab’s risk thresholds before it ships, serving as a critical gatekeeper for our open weight releases

Develop scalable, automated safety benchmarks that evolve alongside our model capabilities, moving beyond static datasets to dynamic adversarial testing

Research and implement state-of-the-art jailbreaking techniques and defenses to stay ahead of potential vulnerabilities in the wild

Qualification

LLM safetyAdversarial attacksRed-teaming methodologiesAutomated evaluation pipelinesLarge-scale ML systemsReinforcement LearningSoftware engineeringHigh-stakes decision makingPassion for intelligenceFast-paced environment

Required

Graduate degree (MS or PhD) in Computer Science, Machine Learning, or related discipline, or equivalent practical experience in AI Safety

Deep technical understanding of LLM safety, including adversarial attacks, red-teaming methodologies, and interpretability

Strong software engineering capabilities with experience building automated evaluation pipelines or large-scale ML systems

Thrive in a fast-paced, high-agency startup environment with bias toward action

Willing to make high-stakes decisions regarding model release and safety thresholds

Passionate about advancing the frontier of intelligence

Preferred

Experience with Reinforcement Learning (RLHF/RLAIF) and how it impacts model safety and alignment is a strong plus

Benefits

Comprehensive medical, dental, vision, life, and disability insurance.

Fully paid parental leave for all new parents, including adoptive and surrogate journeys.

Financial support for family planning.

Paid time off when you need it, relocation support, and more perks that optimize your time.

Lunch and dinner are provided daily.

Regular off-sites and team celebrations.

Company

Reflection AI

Frontier open intelligence accessible to all. Our team previously built frontier LLMs at labs like DeepMind, OpenAI, and Anthropic.

New York, NY, US

11-50 employees

https://www.reflection.ai/

H1B Sponsorship

Reflection AI has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (5)

Funding

Current Stage

Early Stage

Company data provided by crunchbase