Independent physician-led evaluations across real-world medical use cases.
3 independent board-certified physicians score every response.
Raters are blinded to which model produced the output.
Responses are assessed across 11 criteria, grouped into:
Evaluation uses a 1 to 5 scale:
AI evaluation for general internal medicine diagnostics and treatment
AI evaluation across text and image cases to measure diagnostic accuracy and safety.
Error analysis of AI Model Performance in Dentistry