Apply on Employer Site

Mindrift · 1 day ago

Freelance Agent Evaluation Engineer

United States

Part-time

Remote

Mid Level

3+ years exp

Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. The role involves creating structured test cases, defining evaluation logic, and analyzing agent performance to ensure scenarios are production-ready.

Computer Software

Responsibilities

Create structured test cases that simulate complex human workflows

Define gold-standard behavior and scoring logic to evaluate agent actions

Analyze agent logs, failure modes, and decision paths

Work with code repositories and test frameworks to validate your scenarios

Iterate on prompts, instructions, and test cases to improve clarity and difficulty

Ensure that scenarios are production-ready, easy to run, and reusable

Qualification

PythonGitJSON/YAMLDockerLLM understandingEnglish proficiency

Required

3+ of software development experience with strong Python focus

Experience with Git and code repositories

Comfortable with structured formats like JSON/YAML for scenario description

Understanding core LLM limitations (hallucinations, bias, context limits) and how these affect evaluation design

Familiarity with Docker

English proficiency - B2

Benefits

Fixed project rate or individual rates, depending on the project

Some projects include incentive payments

Company

Mindrift

Welcome to Mindrift — a space where innovation meets opportunity.

501-1000 employees

https://mindrift.ai

Funding

Current Stage

Late Stage

Company data provided by crunchbase