AI Evaluation Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

DeepRec.ai · 1 week ago

AI Evaluation Engineer

DeepRec.ai is a mission-driven tech company focused on building AI-enabled products for governments and nonprofits. They are seeking an AI Evaluation Engineer to design and own evaluation systems that ensure AI features are safe and reliable before deployment.

Hiring Manager
Ben Reavill
linkedin

Responsibilities

Own the evaluation stack – design frameworks that define “good,” “risky,” and “catastrophic” outputs
Automate at scale – build data pipelines, LLM judges, and integrate with CI to block unsafe releases
Stress testing – red team AI systems with challenge prompts to expose brittleness, bias, or jailbreaks
Track and monitor – establish model/prompt versioning, build observability, and create incident response playbooks
Empower others – deliver tooling, APIs, and dashboards that put eval into every engineer’s workflow

Qualification

OpenAI APILLM ecosystemsSoftware engineeringStatistical analysisData quality validationTypeScriptPrompting techniquesMonitoring toolsData science tooling

Required

Strong software engineering background (TypeScript a plus)
Deep experience with OpenAI API or similar LLM ecosystems
Practical knowledge of prompting, function calling, and eval techniques (e.g. LLM grading, moderation APIs)
Familiarity with statistical analysis and validating data quality/performance

Preferred

experience with observability, monitoring, or data science tooling

Company

DeepRec.ai

twitter
company-logo
We are your Deep Tech recruitment specialists, driven by a mission to power progress in the world’s most exciting industries.

Funding

Current Stage
Growth Stage

Leadership Team

leader-logo
Hayley Killengrey
Co-Founder and Managing Director USA
linkedin
Company data provided by crunchbase