by Weiqi Wang, Baifeng Wang, Yan Zhu, Zhe Wang, Suyuan Peng Introduction Standardized medical examinations, used to assess trainee clinical competencies, provide a rigorous means to verify LLM accuracy and reliability in medical contexts. Although current evaluations use these exams to test LLMs’ clinical reasoning, significant performance variations occur across different clinical scenarios.
Existing methods struggle to adapt to evolving research needs. This study synthesizes prior research on LLMs in medical exams, highlighting current limitations and proposing future research directions.
Methods and analysis The formulation of the protocol was guided by the standards set forth in the JBI Manual for Evidence Synthesis . Following the establishment of precise inclusion/exclusion criteria and search strategies, we will execute systematic searches in the PubMed and Web of Science Core Collection databases.
The method encompasses literature review, data extraction, analytical frameworks, and process mapping. By employing this method, researchers maintain methodological rigor during the entire research process.
Ethics and dissemination This protocol describes a method for performing a scoping review. The investigation focuses on the organized synthesis and examination of previously published research.
PLOS ONE (Medicine) published a clinical update in Research Highlights on 22 Apr 2026.
The item focuses on Evaluation of large language models in medical examinations: A scoping review protocol.
Review the original article for the full source wording and details.