Mindrift · 22 hours ago
Evaluation Scenario Writer - AI Agent Testing Specialist
Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. The Evaluation Scenario Writer will create structured test cases, define evaluation behaviors, analyze agent logs, and ensure scenarios are production-ready.
Computer Software
Responsibilities
Create structured test cases that simulate complex human workflows
Define gold-standard behavior and scoring logic to evaluate agent actions
Analyze agent logs, failure modes, and decision paths
Work with code repositories and test frameworks to validate your scenarios
Iterate on prompts, instructions, and test cases to improve clarity and difficulty
Ensure that scenarios are production-ready, easy to run, and reusable
Qualification
Required
3+ of software development experience with strong Python focus
Experience with Git and code repositories
Comfortable with structured formats like JSON/YAML for scenario description
Understanding core LLM limitations (hallucinations, bias, context limits) and how these affect evaluation design
Familiarity with Docker
English proficiency - B2
Benefits
Incentive payments
Company
Mindrift
Welcome to Mindrift — a space where innovation meets opportunity.
Funding
Current Stage
Late StageCompany data provided by crunchbase