Apply on Employer Site

Together AI · 6 hours ago

Senior Research Engineer, LLM Evaluation and Behavioral Analysis

San Francisco, CA

Full-time

Onsite

Senior Level

$220K/yr - $270K/yr

Together AI is a research-driven artificial intelligence company focused on building advanced open-source-aligned LLMs. The Senior Research Engineer will develop evaluation systems to ensure model reliability and performance, collaborating closely with various teams to shape datasets and influence model improvements.

Artificial Intelligence (AI)Generative AIInternetIT InfrastructureOpen Source

H1B Sponsor Likely

Responsibilities

Build and iterate on evaluation frameworks that measure model performance across instruction following, function calling, long-context reasoning, multi-turn dialog, safety, and agentic behaviors

Develop specialized evaluation suites for: Function calling — argument correctness, schema adherence, tool selection, multi-function planning, and error recovery. Agentic workflows — task decomposition, multi-step planning, self-correction, and autonomous tool-use sequences. Tool-augmented interactions — search, retrieval, code execution, API-driven actions

Create CI/CD automated pipelines for A/B comparisons, regression detection, behavioral drift monitoring, and adversarial probing

Design and curate high-quality evaluation datasets, especially nuanced or challenging cases across domains

Collaborate with researchers and engineers to diagnose failures, triage regressions, and guide data selection, shaping strategies, objective design, and system improvements

Work with engineering teams to build dashboards, reports, and internal tools that help visualize behavior changes across releases

Operate in a fast-paced, high-impact environment with deep technical ownership and close partnership with world-class model researchers and infra engineers

Qualification

PythonLLMs experienceEvaluation toolingExperiment designDistributed workflowsGPU environmentsMulti-turn reasoningBehavioral analysis

Required

Strong engineering skills with Python, evaluation tooling, and distributed workflows

Experience working with LLMs or transformer-based models, particularly in model evaluation, testing, or red-teaming

Ability to reason clearly about qualitative behavior, edge cases, and model failure patterns

Experience designing experiments, building datasets, and interpreting noisy behavioral signals

Understanding of function calling and structured output formats

Familiarity with GPU or distributed compute environments

Hands-on experience evaluating function-calling models, agentic systems, or tool-augmented LLM pipelines

Experience with multi-turn or multi-step reasoning tasks

Familiarity with inference systems, distributed infrastructure, or post-training workflows

Passion for discovering subtle behaviors, surprising model gaps, or edge-case failures

Benefits

Competitive compensation

Startup equity

Health insurance

Other benefits

Company

Together AI

Together AI is a cloud-based platform designed for constructing open-source generative AI and infrastructure for developing AI models.

Founded in 2022

San Francisco, California, USA

201-500 employees

https://www.together.ai

H1B Sponsorship

Together AI has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (19)

2024 (6)

2023 (3)

Funding

Current Stage

Growth Stage

Total Funding

$533.5M

Key Investors

Salesforce VenturesLux Capital

2025-02-20Series B· $305M

2024-03-13Series A· $106M

2023-11-29Series A· $102.5M

Leadership Team

Vipul Ved Prakash

Co-Founder & CEO

Kae Ike Lim

Executive Assistant to Co-Founder and CEO

Recent News

Morningstar.com

AI21 Labs and Together AI Partner to Expand Access to Open-Source Models

2025-11-27

prnasia.com

PEGATRON Strengthens AI Infrastructure Collaboration with Together AI and 5C for NVIDIA GB300 NVL72 and NVIDIA HGX B200 Liquid-Cooled Rack Deployment in U.S. Data Centers

2025-11-19

KrASIA

Coding tools Cursor and Windsurf found using Chinese AI in latest releases

2025-11-07

Company data provided by crunchbase