Director, Agentforce Testing Center Engineering jobs in United States
cer-icon
Apply on Employer Site
company-logo

griddable.io · 6 hours ago

Director, Agentforce Testing Center Engineering

Griddable.io, part of Salesforce, is focused on transforming business through AI, Data, and CRM. They are seeking a technical leader to build and evaluate AI agents, ensuring rigorous evaluation processes that link agent performance to business outcomes.

AnalyticsBig DataCloud Data ServicesData IntegrationInformation TechnologySaaSSoftware

Responsibilities

Build the "Evaluation Core": Lead the engineering of a scalable evaluation platform that runs in parallel with agent execution
Thread Science & Engineering: Operationalize applied science by turning theoretical benchmarks into production regression tests and bring about a discipline of eval driven development
Thought Leadership: Act as the internal SME for AI testing. Educate cross-functional partners (Product, UX, ML) on the difference between stochastic AI behavior and traditional deterministic software
You are an Engineering leader who can lead the group through technical leadership, process management, maintain a good discipline of high quality code delivery aided with AI tools as necessary
You are a People leader who ensures teams have clear priorities and adequate resources. You are a multiplier and have a passion for team and team members’ success providing technical guidance, career development, and mentoring

Qualification

Agent Evaluation ExperienceApplied Science & EngineeringEval MethodologiesProduction-Grade AI ExperienceData EngineeringSimulation EnvironmentsAdvanced DegreeGlobal Team CollaborationCommunication SkillsOrganizational SkillsTime Management

Required

Specialized Agent Evaluation Experience: You have specific experience building evaluation harnesses for LLMs or Agents
Applied Science & Engineering Hybrid: You have a track record of managing 'Research Engineering' or 'Applied Science' teams where you had to operationalize vague scientific goals into shipping code. You are comfortable curating 'Golden Sets' of data and building custom benchmarks from scratch
Deep Knowledge of Eval Methodologies: You are fluent in modern evaluation techniques, including:
LLM-as-a-Judge: Validating judges against human ground truth to prevent self-bias
Behavioral Analysis: Evaluating how an agent thinks (Reasoning Traces/Chain of Thought), not just the final output
Production-Grade AI Experience: You have shipped AI products where you had to manage real-world constraints like token budgets, inference latency, and cost-normalized accuracy. Pragmatic orientation to building ML solutions that work in production at scale
Familiarity with academic and industry benchmarks and their limitations in a business environment
Experience building simulation environments (mock APIs, virtual users) to stress-test agents safely before deployment
Experience with Data engineering, specifically around data acquisition, creating data pipelines, metric measurement, and analysis
Experience owning highly available services and putting processes in place to maintain uptime
Prior experience working with global teams
Strong verbal and written communication skills, organizational and time management skills
Advanced degree in Computer Science, Machine Learning, or related field with a focus on system evaluation or reliability

Company

griddable.io

twittertwitter
company-logo
Griddable.io is a San Jose, CA based SaaS startup that closed Series A funding in 2017 from August Capital, Artiman Ventures, and Carsten Thoma, founding CEO of Hybris (acquired by SAP).

Funding

Current Stage
Early Stage
Total Funding
$8M
2019-01-28Acquired
2018-02-28Series A· $8M

Leadership Team

leader-logo
Burton Hipp
VP of Engineering/Founder
linkedin
Company data provided by crunchbase