Everyone (Public)

NLP/LLM Interest Group

Name: NLP/LLM Interest Group
Start: 2026-02-09T21:00:00.0000000Z
End: 2026-02-09T22:00:00.0000000Z
Location: Yale University

Diagnostic Accuracy and Clinical Reasoning of Multiple Large Language Models

101 College Street

Zoom link and passcode will be share on an email

Add event to Calendar

Add event series to Calendar

Title: Diagnostic Accuracy and Clinical Reasoning of Multiple Large Language Models

Abstract: Large language models are increasingly used for mental health–related questions, yet their performance in psychiatry - where diagnosis depends heavily on narrative interpretation and clinical reasoning - remains poorly understood.

In this talk, I’ll present a mixed-methods evaluation of four contemporary LLMs on 196 psychiatric case vignettes, combining large-scale diagnostic accuracy metrics with clinician-rated assessments of diagnostic reasoning. We find that models can achieve high diagnostic accuracy on vignettes, but - crucially - that clinician-rated reasoning quality is far more predictive of diagnostic correctness than surface-level data extraction. These findings suggest that evaluating how models reason, not just what they predict, is essential for understanding their potential role in psychiatric decision support.

Kevin Jin is a third-year PhD student in the Interdepartmental Program in Computational Biology and Biomedical Informatics at Yale University. He is advised by Hua Xu in the Clinical NLP Lab, a research group in the Department of Biomedical Informatics and Data Science at Yale School of Medicine. He completed his undergraduate work at Johns Hopkins University, receiving a B.S. in Molecular and Cellular Biology in 2020. He is supported by the NSF Graduate Research Fellowship.