Apply on Employer Site

UP.Labs · 21 hours ago

Sr. AI Quality Engineer

United States

Full-time

Remote

Senior Level

UP.Labs is building a cutting-edge AI billing platform for the transportation and logistics industry. The role involves owning end-to-end quality for the AI-powered inference system, developing quality rubrics, and diagnosing issues across the product stack to ensure system reliability and accuracy.

Venture Capital & Private Equity

Responsibilities

Own end-to-end system quality • Develop and maintain a quality rubric for key use cases and exception types. (what “right” looks like, and what failure looks like)

Build and curate golden datasets (representative emails + expected structured output + expected final outcome), including customer-specific variations

Own ongoing quality review in dev and production: regularly inspect high-volume outputs, diagnose what’s breaking and why, and convert discoveries into concrete roadmap items and regression coverage

Define and execute regression tests for new model changes, backend logic changes, or customer-specific use cases

Investigate and diagnose issues across the full stack of the product • Triage quality incidents and ambiguous failures by tracing through:

Email ingestion/parsing

Prompts / model outputs / normalization steps / data contracts

Intermediate structured representations

Event streams and state-machine transitions

Final audit exception generation and downstream reporting

Use logs, traces, event histories, and data queries to isolate root cause

Produce high-signal findings reports: minimal reproduction, suspected component, evidence, impact, and recommended fix

Build scalable quality operations • Create a repeatable triage playbook and classification system for quality issues

Define monitoring & dashboards for quality signals (volume anomalies, exception drift, per-customer error hotspots)

Partner with engineering/AI to improve observability (correlation IDs, structured logging, traceability from email → state transitions)

Act as a product/domain translator • Understand freight billing workflows and how real-world documents and communication map to our system’s model of “truth”

Convert customer-specific requirements into testable rules and expected outcomes

Identify systemic gaps where “reality” doesn’t fit the current schema, and propose product changes

Qualification

AI/LLM output qualityDebugging production issuesEvent-driven architecturesFreight domain experienceSQLPythonSystems thinkingWriting requirementsComfort with ambiguityCommunication skillsHigh ownership

Required

Experience in roles that blend quality + investigation + systems thinking (examples: QA engineer in distributed systems, product analyst with deep debugging, LLM quality analyst, solutions engineer owning incident triage)

Demonstrated experience evaluating AI/LLM output quality (extraction/classification, structured outputs, tool calling, RAG, prompt-driven pipelines, or similar)

Strong technical ability to debug production issues using: log/trace tools (Datadog, ELK, Honeycomb, OpenTelemetry/Jaeger, etc.), SQL and/or Python for analysis and repro, event-driven architectures and workflows/state machines (or similar distributed workflow systems)

Ability to write crisp requirements and acceptance criteria, and translate ambiguity into test cases

Comfort operating in messy, high-volume, edge-case-heavy environments

Preferred

Freight/logistics/audit/billing domain experience (carrier invoices, accessorials, detention, lumper, fuel surcharge, tenders, BOLs, rate confirmations, PODs, etc.)

Experience designing evaluation metrics (precision/recall, drift detection, per-customer or per-use-case scorecards)

Familiarity with workflow engines/state machines and distributed systems failure modes (event ordering, retries, dedupe, idempotency, partial failure)

Experience with annotation/labeling workflows, taxonomy design, and building human-in-the-loop QA processes

Company

UP.Labs

UP.Labs is a first-of-its-kind venture lab unlocking the future of transportation and mobility.

Santa Monica, CA, US

11-50 employees

https://up.partners/labs

Funding

Current Stage

Early Stage

Leadership Team

André Rabold

Head of Engineering & Venture CTO

Keith Rives

CTO, Stealth Retail Data Science Startup

Recent News

GlobeNewswire

$25+ Bn Airline Route Profitability Software Markets, 2019-2024, 2024-2029F, 2034F

2026-01-11

BetaKit

FrontlineIQ raises $3.3-million seed to “gamify” sales jobs with AI coaching

2025-11-08

Pulse 2.0

FrontlineIQ: $3.3 Million Seed Funding Closed For Building AI Coaching Platform For In-Person Sales Teams

2025-10-31

Company data provided by crunchbase