Apply on Employer Site

griddable.io · 7 hours ago

Director, Agentforce Testing Center Engineering

San Francisco, CA

Full-time

Onsite

Director/Executive

Griddable.io, part of Salesforce, is focused on transforming business through AI, Data, and CRM. They are seeking a technical leader to build and evaluate AI agents, ensuring rigorous evaluation processes that link agent performance to business outcomes.

AnalyticsBig DataCloud Data ServicesData IntegrationInformation TechnologySaaSSoftware

Responsibilities

Build the "Evaluation Core": Lead the engineering of a scalable evaluation platform that runs in parallel with agent execution

Thread Science & Engineering: Operationalize applied science by turning theoretical benchmarks into production regression tests and bring about a discipline of eval driven development

Thought Leadership: Act as the internal SME for AI testing. Educate cross-functional partners (Product, UX, ML) on the difference between stochastic AI behavior and traditional deterministic software

You are an Engineering leader who can lead the group through technical leadership, process management, maintain a good discipline of high quality code delivery aided with AI tools as necessary

You are a People leader who ensures teams have clear priorities and adequate resources. You are a multiplier and have a passion for team and team members’ success providing technical guidance, career development, and mentoring

Qualification

Agent Evaluation ExperienceApplied Science & EngineeringEval MethodologiesProduction-Grade AI ExperienceData EngineeringSimulation EnvironmentsAdvanced DegreeGlobal Team CollaborationCommunication SkillsOrganizational SkillsTime Management

Required

Specialized Agent Evaluation Experience: You have specific experience building evaluation harnesses for LLMs or Agents

Applied Science & Engineering Hybrid: You have a track record of managing 'Research Engineering' or 'Applied Science' teams where you had to operationalize vague scientific goals into shipping code. You are comfortable curating 'Golden Sets' of data and building custom benchmarks from scratch

Deep Knowledge of Eval Methodologies: You are fluent in modern evaluation techniques, including:

LLM-as-a-Judge: Validating judges against human ground truth to prevent self-bias

Behavioral Analysis: Evaluating how an agent thinks (Reasoning Traces/Chain of Thought), not just the final output

Production-Grade AI Experience: You have shipped AI products where you had to manage real-world constraints like token budgets, inference latency, and cost-normalized accuracy. Pragmatic orientation to building ML solutions that work in production at scale

Familiarity with academic and industry benchmarks and their limitations in a business environment

Experience building simulation environments (mock APIs, virtual users) to stress-test agents safely before deployment

Experience with Data engineering, specifically around data acquisition, creating data pipelines, metric measurement, and analysis

Experience owning highly available services and putting processes in place to maintain uptime

Prior experience working with global teams

Strong verbal and written communication skills, organizational and time management skills

Advanced degree in Computer Science, Machine Learning, or related field with a focus on system evaluation or reliability

Company

griddable.io

Griddable.io is a San Jose, CA based SaaS startup that closed Series A funding in 2017 from August Capital, Artiman Ventures, and Carsten Thoma, founding CEO of Hybris (acquired by SAP).

Founded in 2016

San Jose, California, USA

11-50 employees

https://griddable.io

Funding

Current Stage

Early Stage

Total Funding

$8M

2019-01-28Acquired

2018-02-28Series A· $8M

Leadership Team

Burton Hipp

VP of Engineering/Founder

Company data provided by crunchbase