LLM Evals Engineering Lead jobs in United States
cer-icon
Apply on Employer Site
company-logo

Grafton Sciences ยท 2 hours ago

LLM Evals Engineering Lead

Grafton Sciences is building AI systems with general physical ability, aiming to push the frontier of physical AI. The Senior LLM Evals Engineer will be responsible for building the evaluation and verification layer for LLM systems, focusing on autonomous workflows and collaboration with various engineering teams.

Machine LearningRobotics
check
H1B Sponsor Likelynote

Responsibilities

Build an eval harness for agentic LLM systems (offline, simulator-in-the-loop, and workflow-in-the-loop)
Design evals for long-horizon planning, specific agent-call correctness, recovery behavior, and safety/constraint adherence
Help with verifier-driven scoring (symbolic checks, simulation/twin checks, surrogate checks) and automated self correction of execution pipeline
Create regression gates and release criteria for model/prompt/toolchain changes; prevent capability and safety regressions
Define metrics for outliers identification and efficient question-asking that reduces uncertainty per unit time
Partner with training teams to turn eval failures into data (SFT/DPO/RL signals) and continuously improve the suite

Qualification

Evaluation systems for ML modelsSoftware engineering (Python)Data pipelinesTest harnessesDistributed executionReproducibilityAgentic failure modesCollaborationAdaptability

Required

Strong experience building evaluation systems for ML models
Excellent software engineering skills (Python, data pipelines, test harnesses, distributed execution, reproducibility)
Deep understanding of agentic failure modes (tool misuse, hallucinated evidence, reward hacking, brittle formatting) and how to measure them
Ability to work across research and production systems in a fast-moving environment

Preferred

LLMs preferred

Benefits

Meaningful equity
Benefits

Company

Grafton Sciences

twittertwitter
company-logo
Building systems of general physical ability to enable superintelligence

H1B Sponsorship

Grafton Sciences has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (2)

Funding

Current Stage
Early Stage
Company data provided by crunchbase