Skip to Main Content

Clinical Natural Language Processing (NLP) Lab

Our lab is dedicated to advancing natural language processing (NLP) through the development of novel methods, robust software, and real-world applications across a range of biomedical texts, including clinical notes, scientific literature, and social media. These three areas are closely interconnected: innovative methods inform the creation of widely used software; that software supports clinical applications; and insights from those applications highlight new challenges, guiding the development of future methods. Together, they form a dynamic and collaborative ecosystem that drives our research in clinical NLP.

Upcoming Events

Feb 202616Monday
  • Everyone
    Huan He, PhD

    NLP/LLM Interest Group

    "Rethinking User Interface Design in the Era of AI Agents"

    Title: Rethinking User Interface Design in the Era of AI Agents

    Abstract:
    Artificial intelligence agents are rapidly reshaping how users interact with digital systems. From embedded copilots to autonomous task executors, agents are no longer confined to chat interfaces—they are becoming integral components of modern user interfaces.

    In this talk, we will share a series of real-world cases and practical lessons drawn from building agent-driven systems in research and data-intensive environments. We will examine how agents are currently embedded into interfaces, what architectural decisions influence usability and trust, and what design trade-offs emerge when combining autonomy with human control. We will also discuss how AI agent tools themselves are transforming the UI design workflow—from rapid prototyping to code generation and interaction simulation.


    Huan He, PhD, is a research scientist in biomedical informatics and data science at Yale University School of Medicine. His primary research areas revolve around visual analytics and their applications in healthcare-related research. Currently, his work is focused on designing and developing visual analytics systems using natural language processing (NLP) and machine learning (ML) technologies, with the goal of facilitating data exploration for health-related clinical questions.

Feb 202624Tuesday
  • Everyone
    Thim Yih Chung, PhD

    NLP/LLM Interest Group

    "The Evolution and Future of Ocular Image-Based Foundation Models"

    Title: The Evolution and Future of Ocular Image-Based Foundation Models

    Abstract: This talk reviews the current landscape of ocular image-based foundation models and examines emerging future directions.

    In this talk, I will discuss how large-scale pretraining has enabled improved generalization, label efficiency, and cross-disease performance in retinal imaging tasks. Beyond current capabilities, I will explore key trends shaping the next phase of development, including modality-specific versus multimodal architectures, global-scale pretraining across diverse populations, and integration with language models for clinical reasoning. Finally, I will address benchmarking, validation, and translational challenges that must be addressed to move foundation models from research innovation to real-world ophthalmic care.


    Yih Chung Tham, PhD, is a Presidential Young Professor and a clinician scientist in the Department of Ophthalmology at the Yong Loo Lin School of Medicine, National University of Singapore (NUS). At NUS Medicine’s Centre for Innovation & Precision Eye Health, he holds dual leadership roles as Co-Lead for Population Data Science and Program Director for Optometry Education.

    His research focuses on big data analytics, ocular imaging, deep learning, and large language models in ophthalmology. He has published more than 350 peer-reviewed articles in leading journals such as Nature Medicine, Nature Biomedical Engineering, Nature Aging, Lancet Digital Health, and Ophthalmology, with an H-index of 65. Among his most influential works is the Global Glaucoma Burden study, one of the most highly cited ophthalmology-related papers of all time, with over 8,500 citations. Since 2021, he has been consistently recognized among the world’s top 2% most cited scientists.

Mar 20262Monday
Mar 20269Monday
Mar 202616Monday
Mar 202623Monday
This site is protected by hCaptcha and its Privacy Policy and Terms of Service apply

Past Events

Feb 20269Monday
  • Everyone
    Kevin Jin

    NLP/LLM Interest Group

    Diagnostic Accuracy and Clinical Reasoning of Multiple Large Language Models

    Title: Diagnostic Accuracy and Clinical Reasoning of Multiple Large Language Models

    Abstract: Large language models are increasingly used for mental health–related questions, yet their performance in psychiatry - where diagnosis depends heavily on narrative interpretation and clinical reasoning - remains poorly understood.


    In this talk, I’ll present a mixed-methods evaluation of four contemporary LLMs on 196 psychiatric case vignettes, combining large-scale diagnostic accuracy metrics with clinician-rated assessments of diagnostic reasoning. We find that models can achieve high diagnostic accuracy on vignettes, but - crucially - that clinician-rated reasoning quality is far more predictive of diagnostic correctness than surface-level data extraction. These findings suggest that evaluating how models reason, not just what they predict, is essential for understanding their potential role in psychiatric decision support.


    Kevin Jin is a third-year PhD student in the Interdepartmental Program in Computational Biology and Biomedical Informatics at Yale University. He is advised by Hua Xu in the Clinical NLP Lab, a research group in the Department of Biomedical Informatics and Data Science at Yale School of Medicine. He completed his undergraduate work at Johns Hopkins University, receiving a B.S. in Molecular and Cellular Biology in 2020. He is supported by the NSF Graduate Research Fellowship.

Jan 202619Monday
Jan 20265Monday
Dec 202522Monday
  • Everyone
    Lingfei Qian - Xueqing Peng, PhD

    NLP/LLM Interest Group

    This session will feature two exciting talks:

    1. Accelerating Cohort Identification from EHRs with Biomedical Knowledge and LLMs by Lingfei Qian, PhD

    AbstractIdentifying eligible patients from electronic health records (EHRs) is a key challenge in clinical research. We present a framework that combines large language models (LLMs), Text-to-SQL, and retrieval-augmented generation (RAG) to streamline cohort identification. Eligibility criteria are first decomposed and partially translated into structured queries via Text-to-SQL, providing a preliminary selection from OMOP-formatted EHR data. The core innovation focuses on RAG/QA to retrieve and assess patient-level evidence from both clinical notes and structured tables, emphasizing nuanced evaluation of complex criteria like disease chronicity, lab thresholds, and clinical stability, while supporting interactive cohort exploration and detailed patient-level evidence review. This workflow reduces manual effort, improves accuracy, and offers a scalable, clinically grounded solution for EHR-based cohort identification.

    2. An Information Extraction Approach to Detecting Novelty of Biomedical Publications by Xueqing Peng, PhD

    Abstract: Scientific novelty plays a critical role in shaping research impact, yet it remains inconsistently defined and difficult to quantify. Existing approaches often reduce novelty to a single measure, failing to distinguish the specific types of contributions (such as new concepts or relationships) that drive influence. In this study, we introduce a semantic measure of novelty based on the emergence of new biomedical entities and relationships within the conclusion sections of research articles. Leveraging transformer-based named entity recognition (NER) and relation extraction (RE) tools, we identify novel findings and classify articles into four categories: No Novelty, Entity-only Novelty, Relation-only Novelty, and Entity-Relation Novelty. We evaluate this framework using citation counts and Journal Impact Factors (JIF) as proxies for research influence. Our results show that Entity-Relation Novelty articles receive the highest citation impact, with relation novelty more closely aligned with high-impact journals. These findings offer a scalable framework for assessing novelty and guiding future research evaluation.

This site is protected by hCaptcha and its Privacy Policy and Terms of Service apply