Ongoing Projects
Foundation Large Language Models
Our lab applies advanced large language models (LLMs)—including LLaMA, GPT, and others—to extract meaningful insights from a range of medical texts. These include electronic health records, scientific publications, and other sources of biomedical information.
Unlike general-purpose LLMs, our models are specifically trained on biomedical data, making them more accurate and effective in interpreting and generating specialized medical content.
AI-powered visual platform
In the rapidly advancing field of biomedicine, researchers face an overwhelming volume of publications—including scientific articles, books, technical reports, and working papers. This growing body of knowledge presents significant challenges for effective navigation and exploration. While various search engines have improved literature retrieval, gaps remain in enabling deeper understanding and exploration of biomedical knowledge at scale.
To help address this need, we developed MedViz, a visual analytics tool that uses large language models to explore the semantic relationships within vast collections of publications. By combining advanced natural language processing with interactive visualizations, MedViz provides researchers with an intuitive way to navigate and make sense of complex biomedical literature.
NLP for Real World Studies
Clinical documentation in electronic health records contains essential details about patients and their care. Natural language processing (NLP) helps unlock this information, supporting real-world studies.
The OHDSI NLP Working Group develops methods and tools to integrate clinical text into observational research. This paper presents a framework for representing and using textual data within the OMOP Common Data Model (CDM), including workflows to extract, transform, and load (ETL) clinical notes into OMOP tables.
We also share use cases from large consortia and institutions, and discuss challenges and lessons learned to guide future NLP implementations in real-world research.
Language, Lifestyle and Brain Health Study
Learn about this study on dementia/cognitive decline and biolingualism.
Grants
Current
- U24MH136069 (07/15/2024 – 04/30/2029): Coordinating Individually Measured Phenotypes to Advance Mental Health Research
- U24LM013755 (12/21/2020 - 11/30/2025): RADx-Rad Discoveries & Data: Consortium Coordination Center Program Organization
- R01AG073435 (9/15/2021 - 5/31/2026): TRiPOD: Toward Reusable Phenotypes in Observational Data for AD/ADRD - managing definitions and correcting bias
- R01AG078154 (9/1/2022 - 5/31/2027): Detecting synergistic effects of pharmacological and non-pharmacological interventions for AD/ADRD
- U24MH130988 (9/1/2022 - 6/30/2027): Engagement and outreach to achieve a FAIR data ecosystem for the BICAN
- R01AG080429 (2/15/2023 - 1/31/2028): Leveraging Longitudinal Data and Informatics Technology to Understand the Role of Bilingualism in Cognitive Resilience, Aging and Dementia
Past
- R56AG069880 (12/21/2020 - 11/30/2024): Advancing Drug Repositioning for Alzheimer’s Disease using Real-world Data
- RF1AG072799 (5/1/2021 - 4/30/2024): Facilitate Observational Studies of Alzheimer's Disease and Alzheimer's Disease-Related Dementias Using Ontology and Natural Language Processing
- R01LM013519 (9/1/2021 - 5/31/2025): PheBC: bias correction methods for EHR derived phenotype
- U2COD023196 (07/01/2016 - 06/30/2021): Partnership in Learning around Engagement, Data, Genomics, and Environment
- U24CA194215 (09/01/2016 - 08/31/2021): Advancing Cancer Pharmacoepidemiology Research through EHRs and Informatics
- R01LM011829 (09/01/2014 - 08/31/2018): Patient Medical History Representation, Extraction, and Inference from EHR Data
- R01HS022895 (09/30/2014 - 09/29/2019): Learning from patient safety events: A case based toolkit
- R01LM010681 (05/31/2010 - 09/28/2018): Interactive machine learning methods for clinical natural language processing
- R01GM103859 (09/18/2014 – 5/31/2018): Informatics Tools for Pharmacogenomic Discovery using Practice-based Data
- R01GM102282 (04/01/2013 - 03/31/2017): Natural Language Processing for Clinical and Translational Research
- U24AI117966 (09/29/2014 - 08/31/2017): BioCADDIE: Biomedical and healthCAre Data Discovery and Indexing Ecosystem
- U01CA180964 (09/01/13 - 08/31/16): Informatics to enable routine personalized cancer therapy.
- R01LM011563 (09/01/13 - 08/31/16): Using Biomedical Knowledge to Identify Plausible Signals for Pharmacovigilance