Sully.ai · 3 days ago
Applied Research Engineer (Contract to Hire)
Sully.ai is focused on revolutionizing healthcare by building AI teammates that enhance clinician capabilities. The Applied Research Engineer will be responsible for developing and scaling automated evaluation pipelines with clinical-grade benchmarks, ensuring high performance and reliability in AI applications for healthcare.
Artificial Intelligence (AI)Health CareHospitalMachine LearningSoftware
Responsibilities
Build and scale automated evaluation pipelines (LLM-as-judge + human review) with clinical-grade benchmarks
Audit existing evaluation approaches for clinical and agentic tasks
Define initial benchmarks and build early automated pipelines
Partner with engineering to land first set of CI gates for accuracy, factuality, and safety
Deliver a repeatable evaluation framework with automated pipelines in production
Demonstrate measurable improvements in robustness, hallucination reduction, or safety
Publish or present internal research findings that directly shape product reliability
Qualification
Required
Proven experience designing agentic processes and LLM evaluation/benchmarking frameworks
Strong Python and ML background (PyTorch/TensorFlow, Hugging Face, LangChain/LlamaIndex)
Demonstrated ability to design rigorous experiments and translate findings into production
Track record of published research or deep applied work in LLMs and agent evaluation
Strong communication and technical writing skills to articulate complex findings clearly