- January 26, 2026
Kalpana Raja Awarded NIH SBIR Grant to Assess Immunology Dataset Quality
- September 17, 2025Source: NIH Reporter
Yale BIDS Awarded $2.7 Million NIH Grant to Develop AI Explainability Tools for Clinical Decision-Making
- September 16, 2025Source: NIH
Yale Team Recognized in NIH $1 Million Data Sharing Challenge
Clinical Natural Language Processing (NLP) Lab
Our lab is dedicated to advancing natural language processing (NLP) through the development of novel methods, robust software, and real-world applications across a range of biomedical texts, including clinical notes, scientific literature, and social media. These three areas are closely interconnected: innovative methods inform the creation of widely used software; that software supports clinical applications; and insights from those applications highlight new challenges, guiding the development of future methods. Together, they form a dynamic and collaborative ecosystem that drives our research in clinical NLP.
Upcoming Events
Copy Link
Everyone Huan He, PhDNLP/LLM Interest Group
"Rethinking User Interface Design in the Era of AI Agents"
Title: Rethinking User Interface Design in the Era of AI Agents
Abstract: Artificial intelligence agents are rapidly reshaping how users interact with digital systems. From embedded copilots to autonomous task executors, agents are no longer confined to chat interfaces—they are becoming integral components of modern user interfaces.In this talk, we will share a series of real-world cases and practical lessons drawn from building agent-driven systems in research and data-intensive environments. We will examine how agents are currently embedded into interfaces, what architectural decisions influence usability and trust, and what design trade-offs emerge when combining autonomy with human control. We will also discuss how AI agent tools themselves are transforming the UI design workflow—from rapid prototyping to code generation and interaction simulation.
Huan He, PhD, is a research scientist in biomedical informatics and data science at Yale University School of Medicine. His primary research areas revolve around visual analytics and their applications in healthcare-related research. Currently, his work is focused on designing and developing visual analytics systems using natural language processing (NLP) and machine learning (ML) technologies, with the goal of facilitating data exploration for health-related clinical questions.
Everyone Thim Yih Chung, PhDNLP/LLM Interest Group
"The Evolution and Future of Ocular Image-Based Foundation Models"
Title: The Evolution and Future of Ocular Image-Based Foundation Models
Abstract: This talk reviews the current landscape of ocular image-based foundation models and examines emerging future directions.
In this talk, I will discuss how large-scale pretraining has enabled improved generalization, label efficiency, and cross-disease performance in retinal imaging tasks. Beyond current capabilities, I will explore key trends shaping the next phase of development, including modality-specific versus multimodal architectures, global-scale pretraining across diverse populations, and integration with language models for clinical reasoning. Finally, I will address benchmarking, validation, and translational challenges that must be addressed to move foundation models from research innovation to real-world ophthalmic care.
Yih Chung Tham, PhD, is a Presidential Young Professor and a clinician scientist in the Department of Ophthalmology at the Yong Loo Lin School of Medicine, National University of Singapore (NUS). At NUS Medicine’s Centre for Innovation & Precision Eye Health, he holds dual leadership roles as Co-Lead for Population Data Science and Program Director for Optometry Education.
His research focuses on big data analytics, ocular imaging, deep learning, and large language models in ophthalmology. He has published more than 350 peer-reviewed articles in leading journals such as Nature Medicine, Nature Biomedical Engineering, Nature Aging, Lancet Digital Health, and Ophthalmology, with an H-index of 65. Among his most influential works is the Global Glaucoma Burden study, one of the most highly cited ophthalmology-related papers of all time, with over 8,500 citations. Since 2021, he has been consistently recognized among the world’s top 2% most cited scientists.
Everyone Speakers to be announced.
Everyone Speakers to be announced.
Everyone Speakers to be announced.
Everyone Speakers to be announced.
Past Events
Copy Link
Everyone Kevin JinNLP/LLM Interest Group
Diagnostic Accuracy and Clinical Reasoning of Multiple Large Language Models
Title: Diagnostic Accuracy and Clinical Reasoning of Multiple Large Language Models
Abstract: Large language models are increasingly used for mental health–related questions, yet their performance in psychiatry - where diagnosis depends heavily on narrative interpretation and clinical reasoning - remains poorly understood.
In this talk, I’ll present a mixed-methods evaluation of four contemporary LLMs on 196 psychiatric case vignettes, combining large-scale diagnostic accuracy metrics with clinician-rated assessments of diagnostic reasoning. We find that models can achieve high diagnostic accuracy on vignettes, but - crucially - that clinician-rated reasoning quality is far more predictive of diagnostic correctness than surface-level data extraction. These findings suggest that evaluating how models reason, not just what they predict, is essential for understanding their potential role in psychiatric decision support.
Kevin Jin is a third-year PhD student in the Interdepartmental Program in Computational Biology and Biomedical Informatics at Yale University. He is advised by Hua Xu in the Clinical NLP Lab, a research group in the Department of Biomedical Informatics and Data Science at Yale School of Medicine. He completed his undergraduate work at Johns Hopkins University, receiving a B.S. in Molecular and Cellular Biology in 2020. He is supported by the NSF Graduate Research Fellowship.
- EveryoneTENTATIVE
Everyone
Everyone Lingfei Qian - Xueqing Peng, PhDNLP/LLM Interest Group
This session will feature two exciting talks:
1. Accelerating Cohort Identification from EHRs with Biomedical Knowledge and LLMs by Lingfei Qian, PhD
Abstract:Identifying eligible patients from electronic health records (EHRs) is a key challenge in clinical research. We present a framework that combines large language models (LLMs), Text-to-SQL, and retrieval-augmented generation (RAG) to streamline cohort identification. Eligibility criteria are first decomposed and partially translated into structured queries via Text-to-SQL, providing a preliminary selection from OMOP-formatted EHR data. The core innovation focuses on RAG/QA to retrieve and assess patient-level evidence from both clinical notes and structured tables, emphasizing nuanced evaluation of complex criteria like disease chronicity, lab thresholds, and clinical stability, while supporting interactive cohort exploration and detailed patient-level evidence review. This workflow reduces manual effort, improves accuracy, and offers a scalable, clinically grounded solution for EHR-based cohort identification.
2. An Information Extraction Approach to Detecting Novelty of Biomedical Publications by Xueqing Peng, PhD
Abstract: Scientific novelty plays a critical role in shaping research impact, yet it remains inconsistently defined and difficult to quantify. Existing approaches often reduce novelty to a single measure, failing to distinguish the specific types of contributions (such as new concepts or relationships) that drive influence. In this study, we introduce a semantic measure of novelty based on the emergence of new biomedical entities and relationships within the conclusion sections of research articles. Leveraging transformer-based named entity recognition (NER) and relation extraction (RE) tools, we identify novel findings and classify articles into four categories: No Novelty, Entity-only Novelty, Relation-only Novelty, and Entity-Relation Novelty. We evaluate this framework using citation counts and Journal Impact Factors (JIF) as proxies for research influence. Our results show that Entity-Relation Novelty articles receive the highest citation impact, with relation novelty more closely aligned with high-impact journals. These findings offer a scalable framework for assessing novelty and guiding future research evaluation.
Principal Investigator
Copy Link
Contact Information
- Email