AI QA Trainer - LLM Evaluation - Freelance Project jobs in United States
cer-icon
Apply on Employer Site
company-logo

Meridial Marketplace, by Invisible ยท 6 days ago

AI QA Trainer - LLM Evaluation - Freelance Project

Meridial Marketplace, by Invisible is seeking an AI QA Trainer to contribute to the evaluation of large-scale language models. The role involves verifying model reasoning and reliability through rigorous evaluation, designing test plans, and suggesting improvements to enhance model performance.

Computer Software

Responsibilities

Converse with the model on real-world scenarios and evaluation prompts
Verify factual accuracy and logical soundness
Design and run test plans and regression suites
Build clear rubrics and pass/fail criteria
Capture reproducible error traces with root-cause hypotheses
Suggest improvements to prompt engineering, guardrails, and evaluation metrics
Partner on adversarial red-teaming, automation (Python/SQL), and dashboarding to track quality deltas over time

Qualification

Model evaluationLLM safetyTest automationBias/fairness auditingAdversarial testingRegression testingEvaluation rubric designGrounding verificationPythonSQLClear communication

Required

A bachelor's, master's, or PhD in computer science, data science, computational linguistics, statistics, or a related field
expertise in model evaluation
LLM safety
prompt robustness
data quality assurance
multilingual and domain-specific testing
grounding verification
compliance/readiness checks
hallucination detection
factual consistency
prompt-injection and jailbreak resistance
bias/fairness audits
chain-of-reasoning reliability
tool-use correctness
retrieval-augmentation fidelity
end-to-end workflow validation
design and run test plans and regression suites
build clear rubrics and pass/fail criteria
capture reproducible error traces with root-cause hypotheses
suggest improvements to prompt engineering, guardrails, and evaluation metrics (e.g., precision/recall, faithfulness, toxicity, and latency SLOs)
partner on adversarial red-teaming
automation (Python/SQL)
dashboarding to track quality deltas over time
clear, metacognitive communication

Preferred

shipped QA for ML/AI systems
safety/red-team experience
test automation frameworks (e.g., PyTest)
hands-on work with LLM eval tooling (e.g., OpenAI Evals, RAG evaluators, W&B)
evaluation rubric design
adversarial testing/red-teaming
regression testing at scale
bias/fairness auditing
grounding verification
prompt and system-prompt engineering
test automation (Python/SQL)
high-signal bug reporting

Benefits

Health insurance
PTO

Company

Meridial Marketplace, by Invisible

twitter
company-logo
We are the AI training and scaling partner for the leading foundation model providers, enterprises, and governments, bridging the gap between AI potential and production.

Funding

Current Stage
Late Stage
Company data provided by crunchbase