Bespoke Labs · 5 months ago
Member of Technical Staff: RL Environments
Bespoke Labs is an applied AI research lab pioneering data and RL environment curation for training and evaluating agents. The role involves developing systematic strategies for creating high-quality RL environments, analyzing agent behavior, and producing benchmark environments for AI agents.
Computer Software
Responsibilities
Develop systematic strategies and recipes for creating high-quality RL environments that effectively train and evaluate agents
Study how LLMs and agents fail across different task types, identifying patterns that inform better environment design
Create benchmark environments that test specific agent capabilities, packaging them for external release on our evaluation platform
Verify environment quality through hands-on testing—training small-scale agents, checking for reward hacking, and analyzing training dynamics
Work with our environment creation pipeline to scale production of validated environments
Analyze agent rollout data to uncover insights about what makes environments challenging, diverse, and pedagogically valuable
Collaborate with the team to ensure benchmarks integrate smoothly into our external-facing dashboards
Establish quality standards and evaluation protocols that maintain high bars as we scale environment production
Qualification
Required
Strong foundation in machine learning—either through a PhD/MS in ML, CS, or equivalent industry experience
Deep curiosity about agent behavior and failure modes, with ability to form hypotheses and test them systematically
Experience analyzing complex systems and extracting actionable insights from data
Patience and attention to detail for studying agent rollouts and identifying subtle patterns
Proficiency in Python and ML frameworks (PyTorch, JAX, or similar)
Experience with RL concepts and agent training, even if not from a RL background
Ability to design experiments, run training loops, and interpret results
Comfortable working with cloud platforms (GCP, AWS) for running experiments at scale
Can build pipelines and automation to scale research insights into production
Experience with data analysis tools and creating reproducible workflows
Systematic approach to quality verification and testing
Preferred
Hands-on experience with reinforcement learning or agent training systems
Background in data curation, dataset creation, or evaluation benchmark design
Experience with AI safety, robustness testing, or adversarial evaluation
Publications or projects related to RL, agent evaluation, or data-centric AI
Understanding of how to design environments that surface specific failure modes
Experience shipping research artifacts (datasets, benchmarks, evaluation suites) to the community
Benefits
Health coverage
Flexible work arrangements
The opportunity to shape how the AI community evaluates and trains agents
Company
Bespoke Labs
RL for Agents
Funding
Current Stage
Early StageCompany data provided by crunchbase