AI Evals Technical Lead jobs in United States
cer-icon
Apply on Employer Site
company-logo

P-1 AI · 3 hours ago

AI Evals Technical Lead

P-1 AI is focused on building an engineering AGI to impact the built world significantly. The AI Evals Technical Lead will be responsible for developing and validating evaluation tests to ensure the AI, Archie, meets industry skill expectations and effectively performs engineering tasks.

Artificial Intelligence (AI)SoftwareWeb Development
badNo H1Bnote

Responsibilities

Implement the system for organizing, transforming, running, grading, and reporting on eval benchmarks
Design and execute the process by which we develop and QA our evals, incorporating contributions from our own engineering team, industrial partners, and subject-matter experts
Ensure that evals run effectively within our CI/CD system, continuously benchmarking our evolving AI platform and the experiments we’re performing around it
Create methods for detecting and testing for common quality challenges of AI, including hallucinations, undesirable stochasticity, and regressions
Be a technical leader in the consistent implementation and organization of automated tests across other areas of our technology stacks

Qualification

Python programmingTest suite constructionEval designMetrics designVisualizationCI/CD systemsCommunication skillsFast-paced environment adaptability

Required

Feel an unshakeable pull to work on agentic AI
Can usually break an AI or a piece of software in under a minute (if you want to)
Are a skilled developer yourself
Always develop an interest in the subject matter you're building tests for, and are eager to do the same for the industrial products that run the world
Believe in manifesting the future of physical engineering
Experience in constructing comprehensive test suites for software and/or AI systems, including coordinating the contributions of others
Experience designing metrics to evaluate systems and visualize their performance, including differences across successive generations
Good communication skills with a variety of stakeholders (AI researchers, domain experts, application developers)
Proficiency in Python programming, complex modules and modern software development tools and practices (Git, CI/CD, etc.)
Ability to thrive in a fast-paced, dynamic startup environment

Preferred

Experience in developing, managing, and running evals against LLM-based systems is a strong plus

Company

P-1 AI

twittertwitter
company-logo
P-1 AI is a technology company focused on developing an artificial general engineering intelligence (AGEI).

Funding

Current Stage
Early Stage
Total Funding
$23M
Key Investors
Radical VenturesVillage Global
2025-04-28Seed· $23M
2024-07-30Pre Seed

Leadership Team

leader-logo
Paul Eremenko
Co-Founder & CEO
linkedin
Company data provided by crunchbase