NLP/LLM Interest Group
Diagnostic Accuracy and Clinical Reasoning of Multiple Large Language Models
Title: Diagnostic Accuracy and Clinical Reasoning of Multiple Large Language Models
Abstract: Large language models are increasingly used for mental health–related questions, yet their performance in psychiatry - where diagnosis depends heavily on narrative interpretation and clinical reasoning - remains poorly understood.
In this talk, I’ll present a mixed-methods evaluation of four contemporary LLMs on 196 psychiatric case vignettes, combining large-scale diagnostic accuracy metrics with clinician-rated assessments of diagnostic reasoning. We find that models can achieve high diagnostic accuracy on vignettes, but - crucially - that clinician-rated reasoning quality is far more predictive of diagnostic correctness than surface-level data extraction. These findings suggest that evaluating how models reason, not just what they predict, is essential for understanding their potential role in psychiatric decision support.
Kevin Jin is a third-year PhD student in the Interdepartmental Program in Computational Biology and Biomedical Informatics at Yale University. He is advised by Hua Xu in the Clinical NLP Lab, a research group in the Department of Biomedical Informatics and Data Science at Yale School of Medicine. He completed his undergraduate work at Johns Hopkins University, receiving a B.S. in Molecular and Cellular Biology in 2020. He is supported by the NSF Graduate Research Fellowship.
Related Media
Speaker
Contacts
Host Organizations
- Biomedical Informatics & Data Science
- Clinical NLP Lab