Apply on Employer Site

Mental & Mentla · 1 month ago

AI research engineer

San Francisco Bay Area

Full-time

Hybrid

Mid, Senior Level

Pathos is focused on creating innovative AI solutions for therapy, and they are seeking an AI Research Engineer to design, train, and iterate on their AI Therapist. The role involves improving AI therapy quality, maintaining evaluation systems, and collaborating with clinicians and engineers to enhance product capabilities.

FitnessHealth CareWellness

Responsibilities

Improve quality of AI Therapy: Deliver measurable improvements in conversation quality, therapeutic alliance, and user outcomes through fine-tuning strategies, training data curation, building RL environments, new model architectures and other AI innovations

Improve evaluation of AI quality: Improve on and maintain a robust eval stack that includes scripted tests, LLM-as-judge evaluations, human ratings, and safety checks. Improve automated regression testing, detection of defects, and observability (eg dashboards)

Own AI system. Build, maintain, and iterate on the production codebase that delivers AI therapy and supports the evaluation and iteration of our AI

Productionize Models and Pipelines. Own the path from notebook to production: training jobs, model packaging, deployment, monitoring, and rollback strategies. Keep latency, reliability, and cost within agreed budgets while enabling rapid iteration on new ideas

Improve Safety, Alignment, and Clinical Guardrails Work with clinicians and internal experts to encode clinical guidelines into prompts, reward functions, tools, and filters. Proactively identify and reduce harmful or low-quality behaviors through targeted experiments, red teaming, and mitigations

Own Research Roadmap and Experiment Velocity Run high-quality experiments from hypothesis to analysis to improve our understanding of what matters and what works. Shape and execute a focused R&D roadmap

Collaboration with Clinicians, Product, and Engineering. Translate product and clinical requirements into concrete model and system changes. Partner with full-stack product engineers so that new AI capabilities are easy to integrate and maintain in the product

Qualification

Large Language ModelsData EngineeringProduction CodeScientific MindsetAI SafetyAlignmentPrioritizationQuality FocusCollaborationCommunication

Required

Design, train, ship, iterate on, and innovate on the AI brains behind Pathos' AI Therapist

Deliver measurable improvements in conversation quality, therapeutic alliance, and user outcomes through fine-tuning strategies, training data curation, building RL environments, new model architectures and other AI innovations

Improve on and maintain a robust eval stack that includes scripted tests, LLM-as-judge evaluations, human ratings, and safety checks

Improve automated regression testing, detection of defects, and observability (eg dashboards)

Build, maintain, and iterate on the production codebase that delivers AI therapy and supports the evaluation and iteration of our AI

Own the path from notebook to production: training jobs, model packaging, deployment, monitoring, and rollback strategies

Keep latency, reliability, and cost within agreed budgets while enabling rapid iteration on new ideas

Work with clinicians and internal experts to encode clinical guidelines into prompts, reward functions, tools, and filters

Proactively identify and reduce harmful or low-quality behaviors through targeted experiments, red teaming, and mitigations

Run high-quality experiments from hypothesis to analysis to improve our understanding of what matters and what works

Shape and execute a focused R&D roadmap

Translate product and clinical requirements into concrete model and system changes

Partner with full-stack product engineers so that new AI capabilities are easy to integrate and maintain in the product

Demonstrates strong experience with large language models, including fine-tuning, training data design, and model selection

Knows how to move core metrics on conversation quality and user outcomes, rather than chasing generic benchmarks

Can look at evals, transcripts, and metrics and quickly form grounded hypotheses for improvement

Ships clean, maintainable, quality code

Experience shipping production-level code and/or maintaining an AI system in production

Can set up production-level data pipelines for training new models, evals, analysis, etc

You formulate hypotheses, and you are good at evaluating them (eg through experiments, data analysis, etc)

You are consistently learning at the cutting edge, and you're able to leverage and communicate those learnings to make the entire company more successful

You are keenly aware of how to provide company value and to prioritize projects accordingly

Refuses to ship subpar work, continuously improving the codebase

Prioritizes speed by leveraging AI, breaking down complex tasks, shipping early, optimizing for learnings, iterating quickly, and avoiding over-engineering

You can work collaboratively in a positive way

Sees others perspectives

Strong opinions, loosely held

Focused on user/business value, not ego

Preferred

Personal or other experience with therapy or coaching

Domain knowledge of psychology, neuroscience, therapy, or coaching

Company

Mental & Mentla

AI therapy and tools for daily living.

Founded in 2022

Honolulu, Hawaii, USA

2-10 employees

https://www.getmental.com/

Funding

Current Stage

Early Stage

Total Funding

$10.12M

2025-12-02Series Unknown· $10.12M

2022-10-01Seed

2022-08-24Series Unknown

Company data provided by crunchbase