Meridial Marketplace, by Invisible ยท 6 days ago
AI QA Trainer - LLM Evaluation - Freelance Project
Meridial Marketplace, by Invisible is seeking an AI QA Trainer to contribute to the evaluation of large-scale language models. The role involves verifying model reasoning and reliability through rigorous evaluation, designing test plans, and suggesting improvements to enhance model performance.
Computer Software
Responsibilities
Converse with the model on real-world scenarios and evaluation prompts
Verify factual accuracy and logical soundness
Design and run test plans and regression suites
Build clear rubrics and pass/fail criteria
Capture reproducible error traces with root-cause hypotheses
Suggest improvements to prompt engineering, guardrails, and evaluation metrics
Partner on adversarial red-teaming, automation (Python/SQL), and dashboarding to track quality deltas over time
Qualification
Required
A bachelor's, master's, or PhD in computer science, data science, computational linguistics, statistics, or a related field
expertise in model evaluation
LLM safety
prompt robustness
data quality assurance
multilingual and domain-specific testing
grounding verification
compliance/readiness checks
hallucination detection
factual consistency
prompt-injection and jailbreak resistance
bias/fairness audits
chain-of-reasoning reliability
tool-use correctness
retrieval-augmentation fidelity
end-to-end workflow validation
design and run test plans and regression suites
build clear rubrics and pass/fail criteria
capture reproducible error traces with root-cause hypotheses
suggest improvements to prompt engineering, guardrails, and evaluation metrics (e.g., precision/recall, faithfulness, toxicity, and latency SLOs)
partner on adversarial red-teaming
automation (Python/SQL)
dashboarding to track quality deltas over time
clear, metacognitive communication
Preferred
shipped QA for ML/AI systems
safety/red-team experience
test automation frameworks (e.g., PyTest)
hands-on work with LLM eval tooling (e.g., OpenAI Evals, RAG evaluators, W&B)
evaluation rubric design
adversarial testing/red-teaming
regression testing at scale
bias/fairness auditing
grounding verification
prompt and system-prompt engineering
test automation (Python/SQL)
high-signal bug reporting
Benefits
Health insurance
PTO
Company
Meridial Marketplace, by Invisible
We are the AI training and scaling partner for the leading foundation model providers, enterprises, and governments, bridging the gap between AI potential and production.
Funding
Current Stage
Late StageCompany data provided by crunchbase