People In AI · 16 hours ago
Senior Platform Engineer, Evaluations (AI Infrastructure)
People In AI is a specialized recruiting partner for cutting-edge AI startups. They are seeking a Senior Platform Engineer to own and operationalize quality definitions for AI agents, building evaluation systems from scratch while collaborating with cross-functional teams.
Responsibilities
Build and own online and offline evaluation pipelines for agent behavior across MELT data, code, and unstructured inputs
Define and refine quality metrics that capture reasoning trajectories, not just outputs
Design evaluations in messy, real-world, high-volume, hard-to-label systems
Extend core agent infrastructure (e.g., middleware, sub-agents, orchestration) to support evaluation and iteration
Productionize the stack with strong observability, uptime, and performance guarantees
Develop internal tools (e.g., CLIs, validation harnesses) to accelerate iteration for platform, research, and product
Collaborate with research engineers to bring new agent architectures (e.g., multi-path reasoning) into production
Qualification
Required
Experience in ML platform, data, or backend engineering, ideally in zero to one or ambiguous domains
Experience designing evaluation systems in noisy, context-rich environments (e.g., CV, research, infra ML)
Strong systems thinking: can reason across distributed systems, data modeling, and product surface areas
Clean, scalable coding in Python and TypeScript
Ability to define ground truth in approximate domains and defend your metrics with rigor
Track record of platformizing work to unlock others
Benefits
Full coverage
Retirement
Fitness stipend
Unlimited PTO
And more
Company
People In AI
At People in AI, we specialize in staffing solutions for the rapidly expanding AI sector.
Funding
Current Stage
Early StageCompany data provided by crunchbase