Apply on Employer Site

Elicit · 1 day ago

Evaluation Engineer

United States

Full-time

Remote

Mid, Senior Level

$140K/yr - $200K/yr

3+ years exp

Elicit is an AI research platform that uses language models to help researchers make better decisions. The Evaluation Engineer will own the technical foundation of auto-evaluation systems, ensuring they are fast, reliable, and user-friendly while focusing on decision-making in pharma.

Artificial Intelligence (AI)Data Center AutomationDatabaseInformation Technology

Growth Opportunities

Responsibilities

You'll build a comprehensive system that runs fast, is easy to use, and supports quickly building new evals:

You’ll build a lightning-fast basic evals infrastructure that schedules tasks to introduce practically no latency; and then you’ll figure out clever ways to solve the fundamental sources of latency (building a version of Elicit, running it on a query, and evaluating it using LMs)

ML engineers need evals to kick off automatically on relevant commits, with results they can see at a glance and drill into

Product managers need dashboards showing performance over time and what's going wrong in production

Your code must be well-architected so other team members and ML engineers can understand and build on it

We need to evaluate how well Elicit actually helps with decision-making in pharma, not just measure what's easy to measure

This requires encoding real knowledge about how pharma customers make decisions (for example, choosing appropriate gold standards)

You'll provide appropriate statistical tests and confidence intervals so we can trust our results

In a typical month, expect to spend:

60% working on the core eval platform

15% working closely with the evals team to build and improve specific evals (e.g., an eval of our paper search within our systematic review flow)

10% mentoring our evals engineering intern

The rest on learning how people interact with the eval system so you can make it work better for them, and understanding what our users want from Elicit so evals measure what matters

Qualification

Backend systemsStatistical testsAdvanced PythonFront-end experienceDeveloper toolsPharma knowledgeML systems evaluationUX sensibilityLanguage-model systemsMentoring

Required

At least 3 years of experience as a professional software engineer, with demonstrated experience building complex backend systems (e.g., backend for a complex website, data pipelines, etc.)

Aptitude and interest in evaluating how Elicit helps with pharma decision-making. There's no particular experience you must have, but we'll evaluate your aptitude

Preferred

Knowledge of statistics (for e.g. calculating power and credence intervals for evals)

Experience with advanced Python (asyncio/trio and parallel processing strategies)

Front-end experience and strong UX sensibility (you'll be building dashboards). TypeScript experience is a plus

Experience building developer tools (ML engineers are one of your most important clients)

Previous experience as a data engineer or working on AI infrastructure

Knowledge of pharma/biomed

Experience evaluating ML systems

Experience building language-model-based systems (helps with understanding Elicit and how to evaluate it)

Benefits

Flexible work environment: work from our office in Oakland or remotely with time zone overlap (between GMT and GMT-8), as long as you can travel for in-person retreats and coworking events

Fully covered health, dental, vision, and life insurance for you, generous coverage for the rest of your family

Flexible vacation policy, with a minimum recommendation of 20 days/year + company holidays

401K with a 6% employer match

A new Mac + $1,000 budget to set up your workstation or home office in your first year, then $500 every year thereafter

$1,000 quarterly AI Experimentation & Learning budget, so you can freely experiment with new AI tools, take courses, purchase educational resources, or attend AI-focused conferences and events

A team administrative assistant who can help you with personal and work tasks

Company

Elicit

Elicit uses language models to help users automate research workflows.

Founded in 2023

Oakland, California, USA

11-50 employees

https://elicit.com

Funding

Current Stage

Early Stage

Total Funding

$31M

Key Investors

Fifty Years

2025-02-26Series A· $22M

2023-09-25Seed· $9M

Leadership Team

Andreas Stuhlmüller

Cofounder & CEO

Jungwon Byun

Cofounder & COO

Recent News

atomicmail.io

ChatGPT Alternative – Best AI Tools for Different Needs

2025-10-16

TechCabal

Beyond ChatGPT, 7 helpful AI tools for students in 2025

2025-08-14

Oman Observer

AI: A game-changer for researchers and innovators

2025-06-30

Company data provided by crunchbase