griddable.io · 7 hours ago
Director, Agentforce Testing Center Engineering
Griddable.io, part of Salesforce, is focused on transforming business through AI, Data, and CRM. They are seeking a technical leader to build and evaluate AI agents, ensuring rigorous evaluation processes that link agent performance to business outcomes.
AnalyticsBig DataCloud Data ServicesData IntegrationInformation TechnologySaaSSoftware
Responsibilities
Build the "Evaluation Core": Lead the engineering of a scalable evaluation platform that runs in parallel with agent execution
Thread Science & Engineering: Operationalize applied science by turning theoretical benchmarks into production regression tests and bring about a discipline of eval driven development
Thought Leadership: Act as the internal SME for AI testing. Educate cross-functional partners (Product, UX, ML) on the difference between stochastic AI behavior and traditional deterministic software
You are an Engineering leader who can lead the group through technical leadership, process management, maintain a good discipline of high quality code delivery aided with AI tools as necessary
You are a People leader who ensures teams have clear priorities and adequate resources. You are a multiplier and have a passion for team and team members’ success providing technical guidance, career development, and mentoring
Qualification
Required
Specialized Agent Evaluation Experience: You have specific experience building evaluation harnesses for LLMs or Agents
Applied Science & Engineering Hybrid: You have a track record of managing 'Research Engineering' or 'Applied Science' teams where you had to operationalize vague scientific goals into shipping code. You are comfortable curating 'Golden Sets' of data and building custom benchmarks from scratch
Deep Knowledge of Eval Methodologies: You are fluent in modern evaluation techniques, including:
LLM-as-a-Judge: Validating judges against human ground truth to prevent self-bias
Behavioral Analysis: Evaluating how an agent thinks (Reasoning Traces/Chain of Thought), not just the final output
Production-Grade AI Experience: You have shipped AI products where you had to manage real-world constraints like token budgets, inference latency, and cost-normalized accuracy. Pragmatic orientation to building ML solutions that work in production at scale
Familiarity with academic and industry benchmarks and their limitations in a business environment
Experience building simulation environments (mock APIs, virtual users) to stress-test agents safely before deployment
Experience with Data engineering, specifically around data acquisition, creating data pipelines, metric measurement, and analysis
Experience owning highly available services and putting processes in place to maintain uptime
Prior experience working with global teams
Strong verbal and written communication skills, organizational and time management skills
Advanced degree in Computer Science, Machine Learning, or related field with a focus on system evaluation or reliability
Company
griddable.io
Griddable.io is a San Jose, CA based SaaS startup that closed Series A funding in 2017 from August Capital, Artiman Ventures, and Carsten Thoma, founding CEO of Hybris (acquired by SAP).