- September 17, 2025Source: NIH Reporter
Yale BIDS Awarded $2.7 Million NIH Grant to Develop AI Explainability Tools for Clinical Decision-Making
- September 16, 2025Source: NIH
Yale Team Recognized in NIH $1 Million Data Sharing Challenge
- July 01, 2025
Hua Xu, PhD, Receives NIH Supplement to Advance Mental Health Research
Clinical Natural Language Processing (NLP) Lab
Our lab is dedicated to advancing natural language processing (NLP) through the development of novel methods, robust software, and real-world applications across a range of biomedical texts, including clinical notes, scientific literature, and social media. These three areas are closely interconnected: innovative methods inform the creation of widely used software; that software supports clinical applications; and insights from those applications highlight new challenges, guiding the development of future methods. Together, they form a dynamic and collaborative ecosystem that drives our research in clinical NLP.
Upcoming Events
Copy Link
Everyone CancelledSpeakers to be announced.
- EveryoneTENTATIVESpeakers to be announced.
Everyone Speakers to be announced.
Everyone Speakers to be announced.
Everyone Speakers to be announced.
Everyone Speakers to be announced.
Past Events
Copy Link
- EveryoneTENTATIVE
Everyone
Everyone Lingfei Qian - Xueqing Peng, PhDNLP/LLM Interest Group
This session will feature two exciting talks:
1. Accelerating Cohort Identification from EHRs with Biomedical Knowledge and LLMs by Lingfei Qian, PhD
Abstract:Identifying eligible patients from electronic health records (EHRs) is a key challenge in clinical research. We present a framework that combines large language models (LLMs), Text-to-SQL, and retrieval-augmented generation (RAG) to streamline cohort identification. Eligibility criteria are first decomposed and partially translated into structured queries via Text-to-SQL, providing a preliminary selection from OMOP-formatted EHR data. The core innovation focuses on RAG/QA to retrieve and assess patient-level evidence from both clinical notes and structured tables, emphasizing nuanced evaluation of complex criteria like disease chronicity, lab thresholds, and clinical stability, while supporting interactive cohort exploration and detailed patient-level evidence review. This workflow reduces manual effort, improves accuracy, and offers a scalable, clinically grounded solution for EHR-based cohort identification.
2. An Information Extraction Approach to Detecting Novelty of Biomedical Publications by Xueqing Peng, PhD
Abstract: Scientific novelty plays a critical role in shaping research impact, yet it remains inconsistently defined and difficult to quantify. Existing approaches often reduce novelty to a single measure, failing to distinguish the specific types of contributions (such as new concepts or relationships) that drive influence. In this study, we introduce a semantic measure of novelty based on the emergence of new biomedical entities and relationships within the conclusion sections of research articles. Leveraging transformer-based named entity recognition (NER) and relation extraction (RE) tools, we identify novel findings and classify articles into four categories: No Novelty, Entity-only Novelty, Relation-only Novelty, and Entity-Relation Novelty. We evaluate this framework using citation counts and Journal Impact Factors (JIF) as proxies for research influence. Our results show that Entity-Relation Novelty articles receive the highest citation impact, with relation novelty more closely aligned with high-impact journals. These findings offer a scalable framework for assessing novelty and guiding future research evaluation.
Everyone Yang RenNLP/LLM Interest Group
"A Prompt Library for Efficient Clinical Entity Recognition Using Large Language Models"
A Prompt Library for Efficient Clinical Entity Recognition Using Large Language Models by Yang Ren, PhD
Abstract: Large Language Models (LLMs) hold strong potential for clinical information extraction (IE), but their evaluation is often limited by manually crafted prompts and the need for annotated data. We developed an automated framework that extracts entity-level schema information from published clinical IE studies to construct structured prompts. Using literature covering 44 diseases and over 100 entities, we generated prompts to evaluate multiple LLMs under few-shot and fine-tuned settings. Compared to baselines using generic prompts, models prompted with schema-derived information consistently outperformed across tasks. Our results demonstrate the value of structured prompting for robust and reproducible LLM evaluation in diverse clinical IE applications.
Principal Investigator
Copy Link
Contact Information
- Email