odiggo ยท 4 hours ago
Applied Research Engineer
Sully.ai is focused on building impactful healthcare solutions using AI technology. The Applied Research Engineer will design and implement automated evaluation pipelines to enhance the reliability and effectiveness of AI agents in clinical settings.
Computer Software
Responsibilities
Build and scale automated evaluation pipelines (LLM-as-judge + human review) with clinical-grade benchmarks
Audit existing evaluation approaches for clinical and agentic tasks
Define initial benchmarks and build early automated pipelines
Partner with engineering to land first set of CI gates for accuracy, factuality, and safety
Deliver a repeatable evaluation framework with automated pipelines in production
Demonstrate measurable improvements in robustness, hallucination reduction, or safety
Publish or present internal research findings that directly shape product reliability
Qualification
Required
Proven experience designing agentic processes and LLM evaluation/benchmarking frameworks
Strong Python and ML background (PyTorch/TensorFlow, Hugging Face, LangChain/LlamaIndex)
Demonstrated ability to design rigorous experiments and translate findings into production
Track record of published research or deep applied work in LLMs and agent evaluation
Strong communication and technical writing skills to articulate complex findings clearly
Company
odiggo
Car Services in minutes
Funding
Current Stage
Early StageCompany data provided by crunchbase