Skip to Main Content

Yale-BI Joint Research Committee (JRC) and Fellows

Current Fellows

  • Yale-Boehringer Ingelheim Biomedical Data Science Fellow '22

    Topic: Multi-omics Analytics and Emerging Technologies

    Project Summary: Recent effort has been made in using CRISPR knockout or activating to screening target to boost T cell effector function and further leverage the immune killing function. However, the manipulating of a single gene might still be hard to overcome the resistance due to genes that can compromise its function in immune cell signaling. Paralogs derived from the same ancestors are reported with synthetic lethal interactions, which might function jointly in augmenting cancer immunity. In this project, we will establish a computational model for predicting paralogs pairs that can team up their function in cancer immunotherapy, by integrating genome-wide CRISPR screens perturbation molecular profiles from Cancer Dependency Map (DepMap) and Connectivity Map (CMap), and cancer datasets with patients receiving immunotherapy. The outcome of this research will deliver in silico tools for screening paralog pairs that can boost immune response, which could inspire effective combination therapeutic strategies toward precision treatment.

  • Dylan Duchen, PhD, MPH

    Yale-Boehringer Ingelheim Biomedical Data Science Fellow '22

    Topic: Multi-omics Analytics and Emerging Technologies

    Project Summary: Graph genome-based models can characterize genetic variation across both microbial organisms and diverse regions of the human genome. We aim to investigate whether these models can also be used to characterize the extensive genetic diversity observed within immunogenetic sequencing datasets (e.g., B cell receptor (BCR) repertoire sequencing). We will develop graph-based approaches to 1) analyze high-throughput immunogenetic sequencing (e.g., BCR repertoire profiling) and 2) perform genetic association tests focused phenotypes related to the host immune response to vaccines, infection, therapeutics developed by Boehringer Ingelheim, and autoimmune diseases. We will also assess whether graph structure/topology is clinically informative and, by annotating regions across the graph using external multi-modal data, assess whether annotated genome graphs can facilitate immunogenetic-focused genome-wide association studies.

  • Yale-Boehringer Ingelheim Biomedical Data Science Fellow '22

    Topic: Multimodal network-based cancer heterogeneity analysis

    Project Summary: In the past decade, the maturity of profiling techniques has led to the discovery that previously defined cancer types/subtypes, which is based on pathological images, can be further classified into sub-subtypes. This refined classification has different omics landscapes and clinical paths and demand different treatment strategies. Accordingly, the first guiding principle of this study is that effectively integrating multimodal data, in particular pathological imaging and multi-omics data, can lead to more refined cancer heterogeneity structures. In heterogeneity analysis, incorporating the interconnections among variables can future reveal more subtle cancer heterogeneity structures. As such, the second guiding principle is that utilizing cutting-edge methods to incorporate interconnections can further improve cancer heterogeneity analysis. Our overarching goal is to develop more effective statistical learning methods for cancer heterogeneity analysis, which can deepen our understanding of cancer biology and facilitate more personalized treatment.

  • Yale-Boehringer Ingelheim Biomedical Data Science Fellow '22

    Topic: Mechanism-based identification of biomarkers and intervention targets from multi-omics datasets

    Project Summary: Recent technological advances allowing for the global characterization of genomic variants, transcription profiles, epigenomic profiles, and protein markers, often down to the single-cell level, have provided unprecedented insights into the homeostatic and perturbed states of biological systems. Analyzing these vast multi-omics datasets to obtain clinically actionable biomarkers and promising intervention targets remains a formidable challenge. Prediction of the individual immune response quality and quantity in health and disease is one quintessential case. Our proposed research will combine statistical analyses with causal inference and multi-scale mathematical modeling to develop a multi-omics data analysis pipeline that (a) provides mechanistic insights into the underlying biological process, (b) captures the diversity seen across individuals, and (c) identifies complex features and rules that are predictive of the response to perturbations. We will apply our approach to datasets characterizing the vaccination response to identify predictive biomarkers and intervention targets to improve vaccine efficacy.

  • Yale-Boehringer Ingelheim Biomedical Data Science Fellow '23

    Topic: Moving from GWAS to Casual Genes and Variants

    Project Summary: The size and ethnic diversity of emerging sequencing datasets are growing rapidly. Combining these data with emerging single cell omic datasets and AI models for predicting gene activity (eg: expression) offers an unprecedented opportunity to uncover the causal genes and cell types that drive human traits and disease. However, in emerging sequencing datasets, the strong, often perfect, linkage among associated ultra-rare variants can yield an unwieldy list of candidate causal variants. This problem is exacerbated by the presence of multiple causal variants (allelic heterogeneity) and migration events, both of which are more common in ethnically diverse datasets. This fine mapping enigma motivates our current research. Using novel statistical methods, we aim to develop an automated yet interpretable approach that does not seek to isolate causal variants, but rather to directly identify target genes and pathways from phenotypic and single cell xQTL data across different cohorts.

  • Yale-Boehringer Ingelheim Biomedical Data Science Fellow '23

    Topic: Multi-omics Analytics for Personalized Medicine

    Project Summary: First, we will develop a framework that integrates spatial transcriptomics, single-cell RNA-seq, single-cell ATAC-seq, high-resolution imaging, and single-cell targeted protein data to identify tissue microenvironments. By utilizing network-based variable selection and regression of cell morphology, we will aggregate selected features using cell adjacency matrices to cluster tissue areas into microenvironments. This multi-modal integration promises to uncover new microenvironment characteristics for targeted therapeutics. Second, we will focus on identifying disease progression-associated changes in tissue microenvironments. Using known biomarker genes, we will differentiate microenvironments and assess disease severity and progression. We will analyze changes in cell compositions, expression profiles, gene regulatory networks, and cell-cell communication networks. Deconvolved spatial transcriptomics and causal network approaches will aid in constructing gene regulatory networks, while Connectome and graph attention network methods will establish cell-cell communication networks. Correlations with disease progression will be examined independently and combined using neural networks to gain a comprehensive understanding for precise therapeutic development.

Alumni Fellows

  • Yale-Boehringer Ingelheim Biomedical Data Science Fellow '21

    Topic: Explainable machine learning models for indication expansion

    Project Summary: Bringing a novel therapeutic to market is a time-consuming and expensive endeavor. Many promising candidates identified through virtual screening and preclinical studies fail in clinical trials due to poor efficacy or lack of improvement in the standard of care. Instead, a target-centric approach, based on the repurposing of safe compounds with known mode of action (MoA), offers many advantages. First, alternative indications (diseases) with high medical need and market potential are identified for a given target, by linking information about the MoA, disease state, and patient populations obtained from large public and proprietary datasets. Subsequently, suitable disease models are chosen (e.g. by literature mining), and therapeutics are fast-tracked for approval, thus limiting the risk of failure. The aim of this project is to develop and integrate novel machine learning methods, with emphasis on explainability of the predicted outcome, to improve the overall performance of Boehringer Ingelheim’s drug repurposing pipeline.

  • Yale-Boehringer Ingelheim Biomedical Data Science Fellow '21

    Topic: A bioinformatics journey: from EHR to genetic data

    Project Summary: As a postdoc at Yale Center for Biomedical Data Science, Xiayuan is going to work on high-throughput biomedical data, including electronic health records (EHRs) and genetic data. His research will focus on extending state-of-the-art machine learning approaches in health using EHRs, developing machine learning algorithms for drug discovery and adverse drug effects, and applying statistical methods to investigate the challenging problems in genetic data. Based on his PhD research, he believes family history linked EHRs succinctly encompasses shared genetic, epigenetic, and environmental features which enhance the analysis of human disease. He plans to apply machine learning algorithms in healthcare domain, such as disease risk prediction, precision medicine and clinical applications using family history linked EHRs. From the perspective of genetic data, his research work is devoted to addressing challenging problems in single-cell RNA sequencing data, developing innovative statistical models on analyzing the impact of genetic variants in human disease.

  • Zhe Sun

    Yale-Boehringer Ingelheim Biomedical Data Science Fellow '21

    Topic: Multi-modal and multi-source data integration for brain imaging. Summary:: With data collected from various brain imaging techniques, there are needs for neurobiological meaningful analytical tools to integrate imaging modalities across techniques and trait-types. To this end, we have proposed a series of neurobiological interpretable models to achieve complex data integrations with applications to neurodegenerative diseases and mental health.