Apply on Employer Site

Meridial Marketplace, by Invisible · 6 days ago

AI QA Trainer - LLM Evaluation - Freelance Project

United States

Contract

Remote

Mid, Senior Level

$6/hr - $65/hr

Meridial Marketplace, by Invisible is seeking an AI QA Trainer to contribute to the evaluation of large-scale language models. The role involves verifying model reasoning and reliability through rigorous evaluation, designing test plans, and suggesting improvements to enhance model performance.

Computer Software

Responsibilities

Converse with the model on real-world scenarios and evaluation prompts

Verify factual accuracy and logical soundness

Design and run test plans and regression suites

Build clear rubrics and pass/fail criteria

Capture reproducible error traces with root-cause hypotheses

Suggest improvements to prompt engineering, guardrails, and evaluation metrics

Partner on adversarial red-teaming, automation (Python/SQL), and dashboarding to track quality deltas over time

Qualification

Model evaluationLLM safetyTest automationBias/fairness auditingAdversarial testingRegression testingEvaluation rubric designGrounding verificationPythonSQLClear communication

Required

A bachelor's, master's, or PhD in computer science, data science, computational linguistics, statistics, or a related field

expertise in model evaluation

LLM safety

prompt robustness

data quality assurance

multilingual and domain-specific testing

grounding verification

compliance/readiness checks

hallucination detection

factual consistency

prompt-injection and jailbreak resistance

bias/fairness audits

chain-of-reasoning reliability

tool-use correctness

retrieval-augmentation fidelity

end-to-end workflow validation

design and run test plans and regression suites

build clear rubrics and pass/fail criteria

capture reproducible error traces with root-cause hypotheses

suggest improvements to prompt engineering, guardrails, and evaluation metrics (e.g., precision/recall, faithfulness, toxicity, and latency SLOs)

partner on adversarial red-teaming

automation (Python/SQL)

dashboarding to track quality deltas over time

clear, metacognitive communication

Preferred

shipped QA for ML/AI systems

safety/red-team experience

test automation frameworks (e.g., PyTest)

hands-on work with LLM eval tooling (e.g., OpenAI Evals, RAG evaluators, W&B)

evaluation rubric design

adversarial testing/red-teaming

regression testing at scale

bias/fairness auditing

grounding verification

prompt and system-prompt engineering

test automation (Python/SQL)

high-signal bug reporting

Benefits

Health insurance

PTO

Company

Meridial Marketplace, by Invisible

We are the AI training and scaling partner for the leading foundation model providers, enterprises, and governments, bridging the gap between AI potential and production.

Founded in 2015

5001-10000 employees

https://www.meridial.ai/

Funding

Current Stage

Late Stage

Company data provided by crunchbase