Apply on Employer Site

Steampunk, Inc. · 20 hours ago

Senior AI Evaluation Scientist

McLean, VA

Full-time

Onsite

Senior Level, Lead/Staff

$135K/yr - $170K/yr

8+ years exp

Steampunk, Inc. is a Change Agent in the Federal contracting industry, focusing on innovative solutions for clients in various sectors. They are seeking a Senior AI Evaluation Scientist to design and lead evaluation programs for AI systems, ensuring accuracy and alignment with mission outcomes.

ConsultingInformation Technology

Growth Opportunities

No H1B

U.S. Citizen Only

Responsibilities

Lead the design and implementation of comprehensive evaluation frameworks for generative and predictive AI models, including accuracy, robustness, relevance, trustworthiness, fairness, hallucination rates, and safety

Develop and maintain automated evaluation pipelines that continuously audit model outputs, monitor quality drift, and validate alignment with mission-specific constraints

Create custom benchmark datasets, challenge sets, and adversarial evaluation strategies tailored to client domains and regulatory requirements

Conduct in-depth error analysis, model behavior studies, and sensitivity assessments to inform iterative improvements in prompts, retrieval systems, models, and orchestration frameworks

Partner with AI Product Engineers, LLMOps Engineers, and Data Scientists to drive model improvements through structured experimentation, A/B testing, and scientifically grounded evaluation cycles

Advise teams on measurement methodologies, statistical significance, and best practices for Trustworthy AI evaluation in alignment with NIST AI RMF, MLSecOps, and agency governance requirements

Document evaluation results, risks, and findings for technical and non-technical audiences, including engineering teams, leadership, and government clients

Contribute to the development of standardized tools, reusable templates, and evaluation components to improve repeatability and quality across engagements

Stay informed of advances in LLM assessment, safety science, red-teaming methodologies, and evaluation frameworks emerging from academia and industry

Mentor junior evaluation staff and help grow Steampunk’s AI measurement and evaluation capabilities

Qualification

Machine Learning EvaluationPythonAI Evaluation FrameworksAutomated Evaluation PipelinesStatistical TestingNLPGenerative AIDataset ConstructionAnalytical SkillsAgile DevelopmentCross-Functional CollaborationCommunication Skills

Required

Ability to hold a position of public trust with the U.S. government

Bachelor's degree and 8 years of experience

5+ years of experience evaluating machine learning, NLP, or generative AI systems, with strong familiarity with LLMs and retrieval-based architectures

Deep understanding of evaluation metrics, statistical testing, dataset construction, experimental design, and model validation methodologies

Hands-on experience with Python and libraries such as PyTorch, Hugging Face, LangChain, scikit-learn, and evaluation tooling (LLM-as-a-judge, rubric-based evaluators, or custom harnesses)

Proficiency in AI evaluation frameworks such as Ragas

Demonstrated experience designing automated evaluation pipelines and integrating them into CI/CD or LLMOps workflows

Strong understanding of AI governance, responsible AI principles, bias detection, fairness metrics, and risk identification

Experience working with structured and unstructured datasets across multiple modalities (text, tabular, documents)

Familiarity with vector databases, RAG architectures, and multi-step LLM workflows

Familiarity with OWASP LLM Top 10 Risks

Excellent analytical, written, and verbal communication skills, with the ability to translate evaluation insights into clear technical recommendations

Proven ability to collaborate with cross-functional engineering and product teams while independently driving evaluation strategy

Experience working in agile or iterative development environments and documenting scientific processes clearly

Company

Steampunk, Inc.

Glassdoor4.4

Steampunk is anchored by a startup culture with a customer-centered delivery approach, we put our Federal government clients in the center of everything we design, develop, and deliver to drive high-quality mission impacts and user experiences at speed.

Founded in 2019

Washington, District of Columbia, USA

201-500 employees

https://steampunk.com/

Funding

Current Stage

Growth Stage

Total Funding

unknown

Key Investors

AcceliCITY powered by Leading Cities

2024-07-31Non Equity Assistance

Leadership Team

Matt Warren

CEO

Mike Saliter

Executive Vice President - Homeland, Commerce, & Justice

Recent News

PR Newswire

Steampunk promotes Stefani Shepherd to Vice President -- Homeland, Commerce, and Justice Portfolio

2025-11-20

Washington Technology

CBP picks 3 for $900M IT, enterprise business support pact

2025-10-01

PRNewswire

Steampunk Awarded BOA on USDA STRATUS Program

2024-05-21

Company data provided by crunchbase