OpenTrain AI · 1 week ago
Evaluation Scenario Writer – AI Agent Testing Specialist (Remote, Contract)
OpenTrain AI is seeking an Evaluation Scenario Writer to enhance the testing of LLM-based agents in realistic environments. The role involves designing structured evaluation scenarios, defining agent behavior, and collaborating with developers to refine evaluation frameworks.
Responsibilities
Design realistic, structured evaluation scenarios that simulate human-performed tasks
Define gold-standard (“gold path”) agent behavior and acceptable variations
Annotate task steps, expected outputs, edge cases, and scoring logic
Review agent outputs and iterate on scenarios for clarity, coverage, and realism
Collaborate with developers and other contributors to test and refine evaluation frameworks
Qualification
Required
Strong analytical thinking and QA-style reasoning
Excellent written English and clear documentation skills
Comfort working with structured formats like JSON/YAML
Basic Python and JavaScript experience required
Preferred
Background in software testing, QA, data analysis, or NLP annotation (strongly preferred)