Apply on Employer Site

Veeva Systems · 1 day ago

AI Data Engineer

San Luis Obispo, CA

Full-time

Hybrid

Mid, Senior Level

$85K/yr - $225K/yr

5+ years exp

Veeva Systems is a mission-driven organization and pioneer in industry cloud, helping life sciences companies bring therapies to patients faster. The AI Data Engineer role is responsible for ensuring the reliability, accuracy, and safety of Veeva AI Agents through rigorous evaluation and systematic validation methodologies.

BiotechnologyCRMEnterprise SoftwareSoftware

No H1B

Responsibilities

Define and establish comprehensive evaluation strategies for new AI Agents

Prioritize the integrity and coverage of test data sets to reflect real-world usage and potential failure modes

Programmatically and manually evaluate the quality of LLM-generated content against predefined metrics (e.g., factual accuracy, contextual relevance, coherence, and safety standards)

Design, curate, and generate diverse, high-quality test data sets, including challenging prompts and scenarios

Evaluate LLM outputs to proactively identify system biases, unsafe content, hallucinations, and critical edge cases

Develop, implement, and maintain scalable automated evaluations to ensure efficient, continuous validation of agent behavior and prevent regressions with new features and model updates

Understand model behaviors and assist in the trace and root-cause analysis of identified defects or performance degradations

Clearly document, track, and communicate performance metrics, validation results, and bug status to the broader development and product teams

Qualification

Data Integrity & ValidationPrompt Engineering & Model ExpertiseAutomated Evaluation ImplementationProgramming & FrameworksDebugging Agentic SystemsAnalytical Problem-SolvingCuriosityCommunication SkillsHigh Work EthicIntegrity

Required

A meticulous, critical, and curious mindset with a dedication to product quality in a rapidly evolving technological domain

Exceptional analytical and systematic problem-solving capabilities

Excellent ability to communicate technical findings to both engineering and product management audiences

Ability to learn application areas quickly

A strong, specialized understanding of data quality principles, including methods for validating datasets against bias, integrity concerns, and quality standards

Ability to craft diverse and adversarial test data to uncover AI edge cases

Demonstrated skill in advanced prompt engineering techniques to create evaluation scenarios that test the AI's reasoning, action planning, and adherence to system instructions

Deep knowledge of LLM common failure modes (hallucination, incoherence, jailbreaking)

5+ years of experience designing and deploying automated evaluation pipelines to assess complex, agentic AI behaviors

Familiarity with quality metrics such as task success rate, semantic similarity, and sentiment analysis for output measurement

Must be comfortable with the specific challenges of debugging agentic systems, including tracing and interpreting an agent's internal reasoning, tool use, and action sequence to pinpoint failure points

5+ years of experience using Python to develop custom evaluation frameworks, writing scripts, and integrating pipelines with CI/CD systems

Familiarity with standard test automation tools (e.g., Pytest, modern web automation tools)

Bachelor's degree in Data Science, Machine Learning, Computer Science, or a related field, with experience in Gen AI / LLMs

High work ethic. Veeva is a hard-working company

High integrity and honesty. Veeva is a PBC and a 'do the right thing' company. We expect that from all employees

Applicants must have the unrestricted right to work in the United States or Canada. Veeva will not provide sponsorship at this time