Research Projects

COVID-19 Research:

Utilize social media data to estimate social distancing. Our recent work has shown that Twitter encapsulates invaluable information that allows for the estimation of different facets of SD in spatiotemporal settings14 including: purpose, implementation, social disruption, adaptation, and emotional response to SD as presented on the project’s dashboard. The plan is to extend this work and build predictive model of COVID-19 cases and deaths by either adjusting the compartmental SEIR transmission dynamics models or build hybrid predictive models based on statistics and machine learning.

COVID-19 Twitter Analysis

Mine Twitter data for misinformation related to COVID-19 vaccines as well as identifying anti-vaxxers across the US. Findings from this project can inform policy makers to better direct public health messaging related to vaccination.

Develop multi-modal prediction models based on deep learning approaches that combine structured clinical variables, unstructured clinical text, and imaging data to enhance the prognosis of four COVID-19-related outcomes (levels of care) including: no admission, admission to hospital, intubation and mortality.

Phenotyping different types of headaches:

The goal of this project is to characterize the different types of headaches and the associated pharmacological and non-pharmacological therapies. Many patients report headache symptoms and ways of treating those symptoms, as documented in their clinical notes, but do not receive an International Classification of Diseases (ICD) diagnostic code for their headaches. We utilize natural language processing (NLP) and machine learning (ML) approaches to conduct a “deeper dive” into unstructured clinical notes to explore the entirety of the documented symptoms and treatments of headaches.

Exploiting Medline articles for gene molecular function prediction:

The goal is to develop predictive models to automatically assign molecular functions to genes using the biomedical literature. We have shown in previous work that PubMed abstracts can be used to classify genes based on functionality using a multi-label classification approach. Our method performed based compared to existing models. We aim to enhance our classification model.

Despite their potential benefits, social media contents have been studied in isolation of the electronic medical records (EHR) of patients, making it difficult to fully understand important behavioral outcomes including fatal/nonfatal opioid overdose and suicide. Sharing social media data for research purposes has been shown to be possible when patients consented to share their social media activity when approached at a Yale addiction recovery clinic. The goal of this project is to compose a combined database on patients by collecting their social media content and link it to their clinical information from the EHR.

Characterizing communication between patients and healthcare providers:

Effective and rapid Patient-Provider Communication (PPC) is key to improve population health management and patient-centered outcomes. Timely communication makes it easier to closely monitor patients’ adherence and swiftly react to changes in health.

In this project, using machine learning and natural language processing, we will automate the process of extracting patterns of communications such as symptoms, adverse events, medications, emotions, and expressions of empathy.