Skip to Main Content

Clinical Natural Language Processing (NLP) Lab

Our lab is dedicated to advancing natural language processing (NLP) through the development of novel methods, robust software, and real-world applications across a range of biomedical texts, including clinical notes, scientific literature, and social media. These three areas are closely interconnected: innovative methods inform the creation of widely used software; that software supports clinical applications; and insights from those applications highlight new challenges, guiding the development of future methods. Together, they form a dynamic and collaborative ecosystem that drives our research in clinical NLP.

Upcoming Events

Jan 202626Monday
Feb 20262Monday
Feb 20269Monday
Feb 202616Monday
Feb 202623Monday
Mar 20262Monday

Past Events

Jan 202619Monday
Jan 20265Monday
Dec 202522Monday
  • Everyone
    Lingfei Qian - Xueqing Peng, PhD

    NLP/LLM Interest Group

    This session will feature two exciting talks:

    1. Accelerating Cohort Identification from EHRs with Biomedical Knowledge and LLMs by Lingfei Qian, PhD

    AbstractIdentifying eligible patients from electronic health records (EHRs) is a key challenge in clinical research. We present a framework that combines large language models (LLMs), Text-to-SQL, and retrieval-augmented generation (RAG) to streamline cohort identification. Eligibility criteria are first decomposed and partially translated into structured queries via Text-to-SQL, providing a preliminary selection from OMOP-formatted EHR data. The core innovation focuses on RAG/QA to retrieve and assess patient-level evidence from both clinical notes and structured tables, emphasizing nuanced evaluation of complex criteria like disease chronicity, lab thresholds, and clinical stability, while supporting interactive cohort exploration and detailed patient-level evidence review. This workflow reduces manual effort, improves accuracy, and offers a scalable, clinically grounded solution for EHR-based cohort identification.

    2. An Information Extraction Approach to Detecting Novelty of Biomedical Publications by Xueqing Peng, PhD

    Abstract: Scientific novelty plays a critical role in shaping research impact, yet it remains inconsistently defined and difficult to quantify. Existing approaches often reduce novelty to a single measure, failing to distinguish the specific types of contributions (such as new concepts or relationships) that drive influence. In this study, we introduce a semantic measure of novelty based on the emergence of new biomedical entities and relationships within the conclusion sections of research articles. Leveraging transformer-based named entity recognition (NER) and relation extraction (RE) tools, we identify novel findings and classify articles into four categories: No Novelty, Entity-only Novelty, Relation-only Novelty, and Entity-Relation Novelty. We evaluate this framework using citation counts and Journal Impact Factors (JIF) as proxies for research influence. Our results show that Entity-Relation Novelty articles receive the highest citation impact, with relation novelty more closely aligned with high-impact journals. These findings offer a scalable framework for assessing novelty and guiding future research evaluation.

Dec 202515Monday
  • Everyone
    Yang Ren

    NLP/LLM Interest Group

    "A Prompt Library for Efficient Clinical Entity Recognition Using Large Language Models"

    A Prompt Library for Efficient Clinical Entity Recognition Using Large Language Models by Yang Ren, PhD

    Abstract: Large Language Models (LLMs) hold strong potential for clinical information extraction (IE), but their evaluation is often limited by manually crafted prompts and the need for annotated data. We developed an automated framework that extracts entity-level schema information from published clinical IE studies to construct structured prompts. Using literature covering 44 diseases and over 100 entities, we generated prompts to evaluate multiple LLMs under few-shot and fine-tuned settings. Compared to baselines using generic prompts, models prompted with schema-derived information consistently outperformed across tasks. Our results demonstrate the value of structured prompting for robust and reproducible LLM evaluation in diverse clinical IE applications.