NLP/LLM Interest Group
This session will feature two exciting talks:
1. Accelerating Cohort Identification from EHRs with Biomedical Knowledge and LLMs by Lingfei Qian, PhD
Abstract:Identifying eligible patients from electronic health records (EHRs) is a key challenge in clinical research. We present a framework that combines large language models (LLMs), Text-to-SQL, and retrieval-augmented generation (RAG) to streamline cohort identification. Eligibility criteria are first decomposed and partially translated into structured queries via Text-to-SQL, providing a preliminary selection from OMOP-formatted EHR data. The core innovation focuses on RAG/QA to retrieve and assess patient-level evidence from both clinical notes and structured tables, emphasizing nuanced evaluation of complex criteria like disease chronicity, lab thresholds, and clinical stability, while supporting interactive cohort exploration and detailed patient-level evidence review. This workflow reduces manual effort, improves accuracy, and offers a scalable, clinically grounded solution for EHR-based cohort identification.
2. An Information Extraction Approach to Detecting Novelty of Biomedical Publications by Xueqing Peng, PhD
Abstract: Scientific novelty plays a critical role in shaping research impact, yet it remains inconsistently defined and difficult to quantify. Existing approaches often reduce novelty to a single measure, failing to distinguish the specific types of contributions (such as new concepts or relationships) that drive influence. In this study, we introduce a semantic measure of novelty based on the emergence of new biomedical entities and relationships within the conclusion sections of research articles. Leveraging transformer-based named entity recognition (NER) and relation extraction (RE) tools, we identify novel findings and classify articles into four categories: No Novelty, Entity-only Novelty, Relation-only Novelty, and Entity-Relation Novelty. We evaluate this framework using citation counts and Journal Impact Factors (JIF) as proxies for research influence. Our results show that Entity-Relation Novelty articles receive the highest citation impact, with relation novelty more closely aligned with high-impact journals. These findings offer a scalable framework for assessing novelty and guiding future research evaluation.
Related Media
Speakers
Contacts
Host Organizations
- Biomedical Informatics & Data Science
- Clinical NLP Lab
- Yale School of Medicine