Distyl · 1 week ago
Applied AI Researcher, Benchmarking
Distyl is a company that develops AI native technologies for collaboration between humans and AI, significantly impacting operations in large enterprises. They are seeking an Applied AI Researcher to join their Benchmarking team, responsible for designing evaluation frameworks and benchmarks to measure the performance of intelligent systems.
Artificial Intelligence (AI)Generative AIInformation TechnologySoftware
Responsibilities
The Benchmarking team defines how progress is measured. Researchers design evaluation frameworks that capture reasoning depth, interaction quality, reliability, and operational impact. They construct benchmarks that reflect real-world complexity. Their systems become the standard by which new architectures, techniques, and releases are judged
Researchers in Benchmarking explore new paradigms for evaluating intelligent systems: adversarial robustness testing, longitudinal performance tracking, and human-in-the-loop assessment. They investigate how metrics shape model behavior and establish rigorous methodologies for quantifying emergent capability. Their insights drive both Distyl’s internal research priorities and industry-wide standards
Qualification
Required
Experience Designing and Running Evaluations: You've built or maintained benchmarks, test suites, or experimental frameworks to measure model or system performance
Statistical and Analytical Rigor: You design fair, reproducible experiments and can extract signal from noisy empirical results
Experience Building with Models, Not Just Building Models: We develop intelligent systems using models rather than training or fine-tuning them. Ideal candidates have expertise in compound AI systems, agentic collaboration, and associated techniques (ensembling, ReAct, graph-of-thoughts, etc.)
Proven Track Record of Research Results: Whether you've published in top journals, posted amazing work on twitter, or somewhere else we want to see what you've done
Uses AI Every Day: Before you can revolutionize someone else's workflow, you need to revolutionize yours. You should be using tools like ChatGPT, Cursor, and Perplexity to accelerate your workflow
Strong Programming and Data Analysis Skills: While you might not consider yourself a software engineer you need to be able to build prototypes of your ideas and then perform the experiments to prove the effectiveness to a F500 Head of AI
Biases Towards Showing vs Telling: Our customers want to see the power of AI today vs discuss the most elegant idea that will take 5 years to realize
Benefits
Equity options
Medical/dental/vision covered at 100% for you and your dependents
401K plan
Commuter benefits
Lunch provided in office
Company
Distyl
Distyl AI partners with blue-chip leaders to help them create the enterprises of the future.
H1B Sponsorship
Distyl has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (7)
Funding
Current Stage
Growth StageTotal Funding
$202MKey Investors
Lightspeed Venture Partners
2025-09-22Series B· $175M
2024-09-24Series A· $20M
2023-04-13Seed· $7M
Recent News
2025-10-27
Crunchbase News
2025-10-15
2025-10-02
Company data provided by crunchbase