Apply on Employer Site

DeepRec.ai · 2 weeks ago

LLM Evaluation Engineering Lead

Redwood City, CA

Full-time

Onsite

Lead/Staff

DeepRec.ai is a deep-tech AI company focused on building autonomous systems for complex environments. They are seeking an LLM Evaluations Engineering Lead to own the evaluation and verification processes for agentic LLM systems, ensuring that these systems improve and function reliably.

Hiring Manager

Ben Reavill

Responsibilities

Build eval harnesses for agentic LLM systems (offline + in-workflow)

Design evals for planning, execution, recovery, and safety

Implement verifier-driven scoring and regression gates

Turn eval failures into training signals (SFT / DPO / RL)

Qualification

Evaluation systems for MLPythonData pipelinesTest harnessesDistributed executionReproducibilityAgentic failure modesReasoning about measurementsResearch experimentationProduction systems

Required

Strong experience building evaluation systems for ML models (LLMs strongly preferred)

Excellent software engineering fundamentals:

Python

Data pipelines

Test harnesses

Distributed execution

Reproducibility

Deep understanding of agentic failure modes, including:

Tool misuse

Hallucinated evidence

Reward hacking

Brittle formatting and schema drift

Ability to reason about what to measure, not just how to measure it

Comfortable operating between research experimentation and production systems

Benefits

High autonomy, strong technical peers, and meaningful equity

Company

DeepRec.ai

We are your Deep Tech recruitment specialists, driven by a mission to power progress in the world’s most exciting industries.

Founded in 2023

Bishop's Stortford, Hertfordshire, GB

51-200 employees

https://linktr.ee/deeprec.ai

Funding

Current Stage

Growth Stage

Leadership Team

Hayley Killengrey

Co-Founder and Managing Director USA

Company data provided by crunchbase