- September 17, 2025Source: NIH Reporter
Yale BIDS Awarded $2.7 Million NIH Grant to Develop AI Explainability Tools for Clinical Decision-Making
- September 16, 2025Source: NIH
Yale Team Recognized in NIH $1 Million Data Sharing Challenge
- July 01, 2025
Hua Xu, PhD, Receives NIH Supplement to Advance Mental Health Research
Clinical Natural Language Processing (NLP) Lab
Our lab is dedicated to advancing natural language processing (NLP) through the development of novel methods, robust software, and real-world applications across a range of biomedical texts, including clinical notes, scientific literature, and social media. These three areas are closely interconnected: innovative methods inform the creation of widely used software; that software supports clinical applications; and insights from those applications highlight new challenges, guiding the development of future methods. Together, they form a dynamic and collaborative ecosystem that drives our research in clinical NLP.
Upcoming Events
Copy Link
Everyone CancelledSpeakers to be announced.
Everyone
Everyone
Everyone Speakers to be announced.
Everyone Speakers to be announced.
Everyone Speakers to be announced.
Past Events
Copy Link
Everyone Lingfei Qian - Xueqing Peng, PhDNLP/LLM Interest Group
This session will feature two exciting talks:
1. Accelerating Cohort Identification from EHRs with Biomedical Knowledge and LLMs by Lingfei Qian, PhD
Abstract:Identifying eligible patients from electronic health records (EHRs) is a key challenge in clinical research. We present a framework that combines large language models (LLMs), Text-to-SQL, and retrieval-augmented generation (RAG) to streamline cohort identification. Eligibility criteria are first decomposed and partially translated into structured queries via Text-to-SQL, providing a preliminary selection from OMOP-formatted EHR data. The core innovation focuses on RAG/QA to retrieve and assess patient-level evidence from both clinical notes and structured tables, emphasizing nuanced evaluation of complex criteria like disease chronicity, lab thresholds, and clinical stability, while supporting interactive cohort exploration and detailed patient-level evidence review. This workflow reduces manual effort, improves accuracy, and offers a scalable, clinically grounded solution for EHR-based cohort identification.
2. An Information Extraction Approach to Detecting Novelty of Biomedical Publications by Xueqing Peng, PhD
Abstract: Scientific novelty plays a critical role in shaping research impact, yet it remains inconsistently defined and difficult to quantify. Existing approaches often reduce novelty to a single measure, failing to distinguish the specific types of contributions (such as new concepts or relationships) that drive influence. In this study, we introduce a semantic measure of novelty based on the emergence of new biomedical entities and relationships within the conclusion sections of research articles. Leveraging transformer-based named entity recognition (NER) and relation extraction (RE) tools, we identify novel findings and classify articles into four categories: No Novelty, Entity-only Novelty, Relation-only Novelty, and Entity-Relation Novelty. We evaluate this framework using citation counts and Journal Impact Factors (JIF) as proxies for research influence. Our results show that Entity-Relation Novelty articles receive the highest citation impact, with relation novelty more closely aligned with high-impact journals. These findings offer a scalable framework for assessing novelty and guiding future research evaluation.
Everyone Yang RenNLP/LLM Interest Group
"A Prompt Library for Efficient Clinical Entity Recognition Using Large Language Models"
A Prompt Library for Efficient Clinical Entity Recognition Using Large Language Models by Yang Ren, PhD
Abstract: Large Language Models (LLMs) hold strong potential for clinical information extraction (IE), but their evaluation is often limited by manually crafted prompts and the need for annotated data. We developed an automated framework that extracts entity-level schema information from published clinical IE studies to construct structured prompts. Using literature covering 44 diseases and over 100 entities, we generated prompts to evaluate multiple LLMs under few-shot and fine-tuned settings. Compared to baselines using generic prompts, models prompted with schema-derived information consistently outperformed across tasks. Our results demonstrate the value of structured prompting for robust and reproducible LLM evaluation in diverse clinical IE applications.
Everyone Fan Ma - Anran LiNLP/LLM Interest Group
This session will feature two exciting talks:
- A Collaborative Reasoning Agent-based Framework with Built-in Verification for Safe Medical Decision-Making by Fan Ma, PhD
Abstract: Large language models (LLMs) have demonstrated expert-level capabilities on medical benchmarks, yet translating these achievements into clinical practice is impeded by persistent risks of hallucination and a lack of verifiable reasoning. While emerging agentic frameworks have begun to address these limitations through multi-step planning, existing systems often prioritize performance optimization over rigorous safety checks and fail to emulate the collective decision-making of multidisciplinary teams. To address these critical gaps, we introduce OpenDx, a multi-agent framework designed to bridge the divide between experimental prototypes and reliable clinical decision support. OpenDx is built upon three core principles: collaboration among specialized agents that simulate distinct clinical roles, integrated verification modules that strictly cross-check outputs for safety and consistency, and an architectural alignment with clinical auditability standards. We present the design and evaluation of OpenDx, demonstrating how structured collaboration significantly enhance reliability compared to baseline models. Our work advocates for a new paradigm of trustworthy medical AI, where performance gains are inseparable from the interpretability and safety assurances required for frontline healthcare deployment. - A Federated and Parameter-Efficient Framework for Large Language Model Training in Medicine: Applications to Clinical Information Extraction by Anran Li, PhD
Abstract: Large language models (LLMs) are advancing medical applications such as patient question answering and diagnosis. Yet extracting structured information from unstructured clinical narratives across healthcare systems remains challenging. Current LLMs struggle with such clinical information extraction (IE) due to complex language, limited annotations, and data silos. We present a federated, model-agnostic framework for training LLMs in medicine, applied to clinical IE. The proposed Fed-MedLoRA enables parameter-efficient federated fine-tuning by transmitting only low-rank adapter parameters, substantially reducing communication and computation costs. Accuracy was assessed across five patient cohorts through comparisons with baselines for LLMs under (1) in-domain training and testing, (2) external patient cohorts, and (3) a case study on new-site adaptation using real-world clinical notes.
- A Collaborative Reasoning Agent-based Framework with Built-in Verification for Safe Medical Decision-Making by Fan Ma, PhD
Everyone Hyunjae Kim - Chia-Hsuan ChangNLP/LLM Interest Group
This session will feature two exciting talks:
1. Rethinking Retrieval-Augmented Generation for Medicine: A Large-Scale, Systematic Expert Evaluation and Practical Insights by Hyunjae Kim, PhD
Abstract: Retrieval-augmented generation (RAG) is widely adopted to keep medical LLMs current and verifiable, yet its effectiveness remains unclear. We present the first end-to-end, expert annotated evaluation of RAG in medicine, systematically assessing the full pipeline across three stages: evidence retrieval, evidence selection, and response generation. Eighteen medical experts provided 80,502 annotations across 800 model outputs on 200 clinical queries.
Contrary to expectations, conventional RAG often degraded performance—only 22% of retrieved passages were relevant, evidence selection was weak, and factuality dropped up to 6%. However, simple strategies like evidence filtering and query reformulation improved performance by up to 12%.
Our findings challenge current RAG assumptions and highlight the need for deliberate system design in medical AI applications.
2. TopicForest: Embedding-Driven Hierarchical Clustering and Labeling for Biomedical Literature by Chia-Hsuan Chang, PhDAbstract: The vast and complex landscape of biomedical literature presents significant challenges for organization and interpretation. Current embedding-based topic models like BERTopic are limited to flat, single-granularity clusters, failing to capture the inherently nested, hierarchical structure of scientific subjects. We introduce TopicForest, a novel framework that captures this natural hierarchy by building a "forest of topic trees" directly from text embeddings.
TopicForest delivers high-quality topic clustering comparable to state-of-the-art flat models while providing the essential multi-scale resolution they lack. Through recursive topic labeling, the framework achieves efficient token usage and practical scalability for large corpora. This design provides researchers with an effective tool for exploring and visualizing hierarchical biomedical knowledge landscapes.
Principal Investigator
Copy Link
Contact Information
- Email