LLM Evaluation Engineering Lead jobs in United States
cer-icon
Apply on Employer Site
company-logo

DeepRec.ai · 2 weeks ago

LLM Evaluation Engineering Lead

DeepRec.ai is a deep-tech AI company focused on building autonomous systems for complex environments. They are seeking an LLM Evaluations Engineering Lead to own the evaluation and verification processes for agentic LLM systems, ensuring that these systems improve and function reliably.

Hiring Manager
Ben Reavill
linkedin

Responsibilities

Build eval harnesses for agentic LLM systems (offline + in-workflow)
Design evals for planning, execution, recovery, and safety
Implement verifier-driven scoring and regression gates
Turn eval failures into training signals (SFT / DPO / RL)

Qualification

Evaluation systems for MLPythonData pipelinesTest harnessesDistributed executionReproducibilityAgentic failure modesReasoning about measurementsResearch experimentationProduction systems

Required

Strong experience building evaluation systems for ML models (LLMs strongly preferred)
Excellent software engineering fundamentals:
Python
Data pipelines
Test harnesses
Distributed execution
Reproducibility
Deep understanding of agentic failure modes, including:
Tool misuse
Hallucinated evidence
Reward hacking
Brittle formatting and schema drift
Ability to reason about what to measure, not just how to measure it
Comfortable operating between research experimentation and production systems

Benefits

High autonomy, strong technical peers, and meaningful equity

Company

DeepRec.ai

twitter
company-logo
We are your Deep Tech recruitment specialists, driven by a mission to power progress in the world’s most exciting industries.

Funding

Current Stage
Growth Stage

Leadership Team

leader-logo
Hayley Killengrey
Co-Founder and Managing Director USA
linkedin
Company data provided by crunchbase