Skip to Main Content

Yale-BI Joint Research Committee (JRC) and Fellows

Yale-Boehringer Ingelheim Biomedical Data Science Fellows

  • Yale-Boehringer Ingelheim Biomedical Data Science Fellow

    Postdoctoral Associate

    Topic: Explainable machine learning models for indication expansionProject Summary: Bringing a novel therapeutic to market is a time-consuming and expensive endeavor. Many promising candidates identified through virtual screening and preclinical studies fail in clinical trials due to poor efficacy or lack of improvement in the standard of care. Instead, a target-centric approach, based on the repurposing of safe compounds with known mode of action (MoA), offers many advantages. First, alternative indications (diseases) with high medical need and market potential are identified for a given target, by linking information about the MoA, disease state, and patient populations obtained from large public and proprietary datasets. Subsequently, suitable disease models are chosen (e.g. by literature mining), and therapeutics are fast-tracked for approval, thus limiting the risk of failure. The aim of this project is to develop and integrate novel machine learning methods, with emphasis on explainability of the predicted outcome, to improve the overall performance of Boehringer Ingelheim’s drug repurposing pipeline.
  • Yale-Boehringer Ingelheim Biomedical Data Science Fellow

    Postdoctoral Associate

    Topic: A bioinformatics journey: from EHR to genetic dataProject Summary: As a postdoc at Yale Center for Biomedical Data Science, Xiayuan is going to work on high-throughput biomedical data, including electronic health records (EHRs) and genetic data. His research will focus on extending state-of-the-art machine learning approaches in health using EHRs, developing machine learning algorithms for drug discovery and adverse drug effects, and applying statistical methods to investigate the challenging problems in genetic data. Based on his PhD research, he believes family history linked EHRs succinctly encompasses shared genetic, epigenetic, and environmental features which enhance the analysis of human disease. He plans to apply machine learning algorithms in healthcare domain, such as disease risk prediction, precision medicine and clinical applications using family history linked EHRs. From the perspective of genetic data, his research work is devoted to addressing challenging problems in single-cell RNA sequencing data, developing innovative statistical models on analyzing the impact of genetic variants in human disease.
  • Yale-Boehringer Ingelheim Biomedical Data Science Fellow

    Yale-Boehringer Ingelheim Biomedical Data Science Fellow

    Topic: Identify brain functional subnetworks and associated genetic vulnerabilities with comorbid mental disordersProject Summary: Emerging evidence indicates that 1) boundaries of psychiatric illnesses are not sharp with behavioral traits and brain function; 2) individual behavioral differences are linked to variability in functional brain networks. These suggest that the spectra of symptom profiles observed in patients may arise through discernible patterns of functional connectome, with the disturbance of individual systems preferentially contributing to domain-specific, but disorder-general, impairments. Meanwhile, gene transcription could strongly correlate with network topography, potentially driving comorbidity between symptomatically related disorders. Hence, our overarching goal is to identify brain functional network fingerprints, link them to dimensional symptom profiles, and characterize the associated genetic underpinnings through a suite of powerful, biologically plausible and computationally efficient statistical models. Successful completion of this research will discover network-level biomarkers and associated genetic vulnerabilities, and facilitate the development of novel treatments and future classification schemes.

Yale Mentors

  • Associate Professor of Genetics and of Computer Science

    Smita Krishnaswamy is an Associate professor in Genetics and Computer Science. She is affiliated with the applied math program, computational biology program,  Yale Center for Biomedical Data Science and Yale Cancer Center. Her lab works on the development of machine learning techniques to analyze high dimensional high throughput biomedical data. Her focus is on unsupervised machine learning methods, specifically manifold learning and deep learning techniques for detecting structure and patterns in data. She has developed algorithms for non-linear dimensionality reduction and visualization, learning data geometry,  denoising, imputation, inference of multi-granular structure, and inference of feature networks from big data. Her group has applied these techniques to many data types such as single cell RNA-sequencing, mass cytometry, electronic health record, and connectomic data from a variety of systems. Specific application areas include immunology,  immunotherapy, cancer, neuroscience, developmental biology and health outcomes. Smita has a Ph.D. in Computer Science and Engineering from the University of Michigan.
  • Associate Professor of Biostatistics

    Dr. Wang is Associate professor of Biostatistics at Yale School of Public Health. Her research focuses on combining genetics, genomics, immunology, and statistical modeling to answer biologically important questions in genetic epidemiological studies. Dr. Wang's statistical expertise lies in kernel machine methods, mixed effects models, correlated data, and longitudinal data analysis. She develops statistically innovative methods and computationally efficient tools in large-scale genetic and genomic studies to identify genetic susceptibility variants and advance the understanding of the etiology of complex diseases including alcohol and drug abuse, asthma, obesity, cardiovascular diseases, and cancer. Current studies include using next-generation sequencing data to detect rare genetic variants in longitudinal genetic studies, combining knowledge in genomics and immunology to understand the risk of breast cancer survival, and differential gene expression in single-cell RNA sequencing data.
  • Associate Professor of Biostatistics

    Dr. Zhao is an Associate Professor in the Department of Biostatistics at Yale School of Public Health and affiliated with Yale Center for Analytical Sciences and Yale Alzheimer's Disease Research Center. Her main research focuses on the development of statistical and machine learning methods to analyze large-scale complex data (imaging, -omics, EHRs), Bayesian methods, feature selection, predictive modeling, data integration, missing data and network analysis. She has strong interests in biomedical research areas including mental health, cancer and cardiovascular diseases, etc. Dr. Zhao received her Ph.D. in Biostatistics from Emory University and postdoc training at Statistical and Applied Mathematical Sciences Institute (SAMSI) and the University of North Carolina at Chapel Hill. Prior to coming to Yale, she was an Assistant Professor in Biostatistics at Cornell University, Weill Cornell Medicine.

Boehringer Ingelheim Mentors

  • Scientist, CB3 Team in Computational Biology and Digital Sciences

    Sergio has worked in Boehringer Ingelheim since May 2019 as a data scientist. His work mainly comprised the integration and interpretation of multi-omic datasets and the assessment of preclinical models and treatments in fibrotic diseases. More recently, Sergio moved to the CB3 team to focus on data integration and harmonisation initiatives, both from the conceptual and practical points of view.His background is Mathematics and Industrial Engineering, under the inter-disciplinary CFIS program at the Polytechnic University of Catalonia. He then engaged a PhD in Biomedical Engineering in the same university, where he explored the application of network propagation algorithms to computational biology tasks such as the interpretation of metabolomics experiments and the prediction of new disease- and pathway-related genes. Specifically, Sergio studied the biases in propagation methods and the consequences of their removal.Links:‪‪Sergio Picart-Armada - ‪Google Scholar
  • Global Computational Biology and Digital Sciences

    Gregorio Alanis-Lobato obtained his MSc in Computer Science and his PhD in Computational Biology at KAUST developing methods to predict protein-protein interactions based on the topological structure of experimentally-derived networks. Then, he moved to the Johannes Gutenberg University in Mainz, Germany to continue his research on this topic as a postdoctoral fellow and to enhance the functionality of HIPPIE, a webtool to construct reliable and context-specific human protein networks. This was followed by a second postdoc at the Francis Crick Institute in London, where he worked on the integration of different single-cell omics data modalities for the construction of gene regulatory networks in early human embryos. In addition, he developed computational pipelines to assess whether CRISPR-Cas9-targeted human preimplantation embryos had unintended on-target mutations based on single-cell genomics and transcriptomics datasets. Gregorio joined BI in late 2020 to support pre-clinical research for the CNS Diseases and Research Beyond Borders therapeutic areas with his expertise in omics data analysis and integration.LinkedIn: Google Scholar:
  • Senior Principal Scientist, Global Computational Biology and Digital Sciences

    Dr. Frank Li is a Senior Principal Scientist at Department of Global Computational Biology and Digital Sciences, Boehringer Ingelheim, Ridgefield, CT. In this role, he oversees bioinformatics analyses using systems biology approaches to better understand and/or characterize the pathological mechanisms contributing to the onset and progression of autoimmune diseases, including Inflammatory Bowel Disease, Systemic Sclerosis, Idiopathic Pulmonary Fibrosis, etc. His group is using holistic “omics” approaches to facilitate discoveries of novel therapeutic concepts, determination of novel biomarkers, and patient stratification and enrichment supporting clinical drug development. He obtained his Ph.D. from the University of North Carolina at Chapel Hill with his doctoral research focus on the breakdown of immunological tolerance of autoreactive CD4+ T cells in autoimmune Type I diabetes. Then, he continued his postdoctoral studies at Harvard Medical School where he worked in the fields of T cell tolerance attempting to decipher the molecular and cellular mechanisms that control T cell differentiation, and the molecular mechanisms controlling the plasticity of FOXP3+ Treg cells using both canonical immunological and systems biology approaches. Linkedin:
  • Head of Human Genetics

    Dr. Zhihao Ding was trained in his PhD at the Wellcome Trust Sanger Institute, the University of Cambridge, studying the genetics of cellular traits and genomic algorithms. He had a Postdoc at the Wellcome Trust Centre for Human Genetics, University of Oxford, studying the genetics of rare diseases and cancers. In 2015, he transitioned to industry working at Genomics PLC, Oxford, UK, where he developed algorithmic solutions and products for rare disease diagnostics. He led work packages for a panel of pharmaceutical companies in evaluating specific targets in disease areas of therapeutic interests. Before joining Boehringer Ingelheim (BI), Dr. Zhihao was leading target projects on NASH at Novo Nordisk Research Centre Oxford (NNRCO). Dr. Zhihao was one of the first genetic scientists joined the NNRCO, where he helped build the genetic capacity and initiated several academic research collaborations with the Big Data Institute, University of Oxford. Dr. Zhihao joined BI as the Head of Human Genetics in July 2020, where he’s leading genetic initiatives in gCBDS.LinkedIn: Scholar link:
  • Computational Biology Expert Lead

    Dr. Di Feng is a senior member of GCBDS (Global Computational Biology and Digital Science), working at Boehringer Ingelheim's US headquarters in Ridgefield, CT on computational drug discovery research. He is a Computational Biology professional with substantial multidisciplinary expertise in Computational Immunology, Pathology, and Machine Intelligence applications for drug discovery. Dr. Feng managed the Artificial Intelligence and Machine Learning partnership with The Center of Computational Imaging and Personalized Diagnostics (CCIPD) at Case Western's University Hospital, the Cleveland Medical Center. Dr. Feng has also contributed open source software tools such as Single Cell Explorer, a platform to facilitate the collaboration between computational biologists and experimental scientists. Dr. Feng led computational projects for a small molecule and biological drug program from early research to clinical trials. He has worked with and led teams to solve complex research challenges using computational approaches across multiple therapeutic areas such as Cancer Immunology, Immunomodulation, Immunology and Respiratory, and Cardiometabolic diseases. He received Ph.D. from Rutgers - Graduate School of Biomedical Sciences, studying basic and clinical biology of plasmacytoid dendritic cells, followed by postdoc research on autoimmune and cancer susceptibility genes with integrating bioinformatics with wet lab science. He also earned his medical degree from Shanghai Jiao Tong University School of Medicine. Prior to joining Boehringer Ingelheim, he developed therapeutics supported by Lupus Research Alliance. LinkedIn:
  • Team Lead Statistical Modeling

    Johann de Jong obtained his PhD in Computational Cancer Biology from Delft University of Technology. Since then, he has gained experience in a wide variety of domains, ranging from gene regulation and chromatin biology to cancer research and neurological disorders, in both academia (the Netherlands Cancer Institute) and industry (BASF, UCB Pharma). He currently leads the Statistical Modeling team within Human Genetics at Boehringer Ingelheim, which focuses on developing and applying novel machine learning and statistical models for biomarker/target identification and patient stratification by integrating prior knowledge with multi-modal and longitudinal data sources including human biobanks.LinkedIn: Google Scholar: