Ongoing Projects

Foundation Large Language Models

Our lab applies advanced large language models (LLMs)—including LLaMA, GPT, and others—to extract meaningful insights from a range of medical texts. These include electronic health records, scientific publications, and other sources of biomedical information.

Unlike general-purpose LLMs, our models are specifically trained on biomedical data, making them more accurate and effective in interpreting and generating specialized medical content.

AI-powered visual platform

In the rapidly advancing field of biomedicine, researchers face an overwhelming volume of publications—including scientific articles, books, technical reports, and working papers. This growing body of knowledge presents significant challenges for effective navigation and exploration. While various search engines have improved literature retrieval, gaps remain in enabling deeper understanding and exploration of biomedical knowledge at scale.

To help address this need, we developed MedViz, a visual analytics tool that uses large language models to explore the semantic relationships within vast collections of publications. By combining advanced natural language processing with interactive visualizations, MedViz provides researchers with an intuitive way to navigate and make sense of complex biomedical literature.

Try MedViz!

NLP for Real World Studies

Clinical documentation in electronic health records contains essential details about patients and their care. Natural language processing (NLP) helps unlock this information, supporting real-world studies.

The OHDSI NLP Working Group develops methods and tools to integrate clinical text into observational research. This paper presents a framework for representing and using textual data within the OMOP Common Data Model (CDM), including workflows to extract, transform, and load (ETL) clinical notes into OMOP tables.

We also share use cases from large consortia and institutions, and discuss challenges and lessons learned to guide future NLP implementations in real-world research.

Language, Lifestyle and Brain Health Study

Learn about this study on dementia/cognitive decline and biolingualism.

Current

U24MH136069 (07/15/2024 – 04/30/2029): Coordinating Individually Measured Phenotypes to Advance Mental Health Research
U24LM013755 (12/21/2020 - 11/30/2025): RADx-Rad Discoveries & Data: Consortium Coordination Center Program Organization
R01AG073435 (9/15/2021 - 5/31/2026): TRiPOD: Toward Reusable Phenotypes in Observational Data for AD/ADRD - managing definitions and correcting bias
R01AG078154 (9/1/2022 - 5/31/2027): Detecting synergistic effects of pharmacological and non-pharmacological interventions for AD/ADRD
U24MH130988 (9/1/2022 - 6/30/2027): Engagement and outreach to achieve a FAIR data ecosystem for the BICAN
R01AG080429 (2/15/2023 - 1/31/2028): Leveraging Longitudinal Data and Informatics Technology to Understand the Role of Bilingualism in Cognitive Resilience, Aging and Dementia

Past

R56AG069880 (12/21/2020 - 11/30/2024): Advancing Drug Repositioning for Alzheimer’s Disease using Real-world Data
RF1AG072799 (5/1/2021 - 4/30/2024): Facilitate Observational Studies of Alzheimer's Disease and Alzheimer's Disease-Related Dementias Using Ontology and Natural Language Processing
R01LM013519 (9/1/2021 - 5/31/2025): PheBC: bias correction methods for EHR derived phenotype
U2COD023196 (07/01/2016 - 06/30/2021): Partnership in Learning around Engagement, Data, Genomics, and Environment
U24CA194215 (09/01/2016 - 08/31/2021): Advancing Cancer Pharmacoepidemiology Research through EHRs and Informatics
R01LM011829 (09/01/2014 - 08/31/2018): Patient Medical History Representation, Extraction, and Inference from EHR Data
R01HS022895 (09/30/2014 - 09/29/2019): Learning from patient safety events: A case based toolkit
R01LM010681 (05/31/2010 - 09/28/2018): Interactive machine learning methods for clinical natural language processing
R01GM103859 (09/18/2014 – 5/31/2018): Informatics Tools for Pharmacogenomic Discovery using Practice-based Data
R01GM102282 (04/01/2013 - 03/31/2017): Natural Language Processing for Clinical and Translational Research
U24AI117966 (09/29/2014 - 08/31/2017): BioCADDIE: Biomedical and healthCAre Data Discovery and Indexing Ecosystem
U01CA180964 (09/01/13 - 08/31/16): Informatics to enable routine personalized cancer therapy.
R01LM011563 (09/01/13 - 08/31/16): Using Biomedical Knowledge to Identify Plausible Signals for Pharmacovigilance