Skip to Main Content
Everyone (Public)

NLP/LLM Interest Group

Diagnostic Accuracy and Clinical Reasoning of Multiple Large Language Models

Title: Diagnostic Accuracy and Clinical Reasoning of Multiple Large Language Models

Abstract: Large language models are increasingly used for mental health–related questions, yet their performance in psychiatry - where diagnosis depends heavily on narrative interpretation and clinical reasoning - remains poorly understood.


In this talk, I’ll present a mixed-methods evaluation of four contemporary LLMs on 196 psychiatric case vignettes, combining large-scale diagnostic accuracy metrics with clinician-rated assessments of diagnostic reasoning. We find that models can achieve high diagnostic accuracy on vignettes, but - crucially - that clinician-rated reasoning quality is far more predictive of diagnostic correctness than surface-level data extraction. These findings suggest that evaluating how models reason, not just what they predict, is essential for understanding their potential role in psychiatric decision support.


Kevin Jin is a third-year PhD student in the Interdepartmental Program in Computational Biology and Biomedical Informatics at Yale University. He is advised by Hua Xu in the Clinical NLP Lab, a research group in the Department of Biomedical Informatics and Data Science at Yale School of Medicine. He completed his undergraduate work at Johns Hopkins University, receiving a B.S. in Molecular and Cellular Biology in 2020. He is supported by the NSF Graduate Research Fellowship.

Speaker

Contacts

Host Organizations

Admission

Free

Event Type

Lectures and Seminars

Tag

Next upcoming occurrences of this event

Feb 20269Monday