Senior Research Engineer, LLM Evaluation and Behavioral Analysis jobs in United States
cer-icon
Apply on Employer Site
company-logo

Together AI · 6 hours ago

Senior Research Engineer, LLM Evaluation and Behavioral Analysis

Together AI is a research-driven artificial intelligence company focused on building advanced open-source-aligned LLMs. The Senior Research Engineer will develop evaluation systems to ensure model reliability and performance, collaborating closely with various teams to shape datasets and influence model improvements.

Artificial Intelligence (AI)Generative AIInternetIT InfrastructureOpen Source
check
H1B Sponsor Likelynote

Responsibilities

Build and iterate on evaluation frameworks that measure model performance across instruction following, function calling, long-context reasoning, multi-turn dialog, safety, and agentic behaviors
Develop specialized evaluation suites for: Function calling — argument correctness, schema adherence, tool selection, multi-function planning, and error recovery. Agentic workflows — task decomposition, multi-step planning, self-correction, and autonomous tool-use sequences. Tool-augmented interactions — search, retrieval, code execution, API-driven actions
Create CI/CD automated pipelines for A/B comparisons, regression detection, behavioral drift monitoring, and adversarial probing
Design and curate high-quality evaluation datasets, especially nuanced or challenging cases across domains
Collaborate with researchers and engineers to diagnose failures, triage regressions, and guide data selection, shaping strategies, objective design, and system improvements
Work with engineering teams to build dashboards, reports, and internal tools that help visualize behavior changes across releases
Operate in a fast-paced, high-impact environment with deep technical ownership and close partnership with world-class model researchers and infra engineers

Qualification

PythonLLMs experienceEvaluation toolingExperiment designDistributed workflowsGPU environmentsMulti-turn reasoningBehavioral analysis

Required

Strong engineering skills with Python, evaluation tooling, and distributed workflows
Experience working with LLMs or transformer-based models, particularly in model evaluation, testing, or red-teaming
Ability to reason clearly about qualitative behavior, edge cases, and model failure patterns
Experience designing experiments, building datasets, and interpreting noisy behavioral signals
Understanding of function calling and structured output formats
Familiarity with GPU or distributed compute environments
Hands-on experience evaluating function-calling models, agentic systems, or tool-augmented LLM pipelines
Experience with multi-turn or multi-step reasoning tasks
Familiarity with inference systems, distributed infrastructure, or post-training workflows
Passion for discovering subtle behaviors, surprising model gaps, or edge-case failures

Benefits

Competitive compensation
Startup equity
Health insurance
Other benefits

Company

Together AI

twittertwittertwitter
company-logo
Together AI is a cloud-based platform designed for constructing open-source generative AI and infrastructure for developing AI models.

H1B Sponsorship

Together AI has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (19)
2024 (6)
2023 (3)

Funding

Current Stage
Growth Stage
Total Funding
$533.5M
Key Investors
Salesforce VenturesLux Capital
2025-02-20Series B· $305M
2024-03-13Series A· $106M
2023-11-29Series A· $102.5M

Leadership Team

leader-logo
Vipul Ved Prakash
Co-Founder & CEO
linkedin
leader-logo
Kae Ike Lim
Executive Assistant to Co-Founder and CEO
linkedin
Company data provided by crunchbase