Evaluation Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Elicit · 2 days ago

Evaluation Engineer

Elicit is an AI research platform focused on enhancing decision-making through innovative evaluation systems. The Evaluation Engineer will own the technical foundation of auto-evaluation systems, ensuring they are fast, reliable, and user-friendly for various stakeholders, including ML engineers and product managers.

Artificial Intelligence (AI)Data Center AutomationDatabaseInformation Technology
check
Growth Opportunities

Responsibilities

Build a comprehensive system that runs fast, is easy to use, and supports quickly building new evals
Build a lightning-fast basic evals infrastructure that schedules tasks to introduce practically no latency
Figure out clever ways to solve the fundamental sources of latency
Ensure ML engineers need evals to kick off automatically on relevant commits, with results they can see at a glance and drill into
Provide product managers dashboards showing performance over time and what's going wrong in production
Ensure your code is well-architected so other team members and ML engineers can understand and build on it
Evaluate how well Elicit actually helps with decision-making in pharma
Provide appropriate statistical tests and confidence intervals so we can trust our results
Work closely with the evals team to build and improve specific evals
Mentor our evals engineering intern
Learn how people interact with the eval system so you can make it work better for them
Understand what our users want from Elicit so evals measure what matters

Qualification

Software EngineeringBackend SystemsStatisticsAdvanced PythonFront-end DevelopmentDeveloper ToolsPharma KnowledgeML Systems EvaluationLanguage Model SystemsUX Sensibility

Required

At least 3 years of experience as a professional software engineer, with demonstrated experience building complex backend systems (e.g., backend for a complex website, data pipelines, etc.)
Aptitude and interest in evaluating how Elicit helps with pharma decision-making. There's no particular experience you must have, but we'll evaluate your aptitude

Preferred

Knowledge of statistics (for e.g. calculating power and credence intervals for evals)
Experience with advanced Python (asyncio/trio and parallel processing strategies)
Front-end experience and strong UX sensibility (you'll be building dashboards). TypeScript experience is a plus
Experience building developer tools (ML engineers are one of your most important clients)
Previous experience as a data engineer or working on AI infrastructure
Knowledge of pharma/biomed
Experience evaluating ML systems
Experience building language-model-based systems (helps with understanding Elicit and how to evaluate it)

Benefits

Flexible work environment: work from our office in Oakland or remotely with time zone overlap (between GMT and GMT-8), as long as you can travel for in-person retreats and coworking events
Fully covered health, dental, vision, and life insurance for you, generous coverage for the rest of your family
Flexible vacation policy, with a minimum recommendation of 20 days/year + company holidays
401K with a 6% employer match
A new Mac + $1,000 budget to set up your workstation or home office in your first year, then $500 every year thereafter
$1,000 quarterly AI Experimentation & Learning budget, so you can freely experiment with new AI tools, take courses, purchase educational resources, or attend AI-focused conferences and events
A team administrative assistant who can help you with personal and work tasks

Company

Elicit

twittertwittertwitter
company-logo
Elicit uses language models to help users automate research workflows.

Funding

Current Stage
Early Stage
Total Funding
$31M
Key Investors
Fifty Years
2025-02-26Series A· $22M
2023-09-25Seed· $9M

Leadership Team

leader-logo
Andreas Stuhlmüller
Cofounder & CEO
linkedin
leader-logo
Jungwon Byun
Cofounder & COO
linkedin
Company data provided by crunchbase