Yale-Boehringer Ingelheim Biomedical Data Science Fellow '21
Yale-BI Joint Research Committee (JRC) and Fellows
Yale-Boehringer Ingelheim Biomedical Data Science Fellows
- Topic: Explainable machine learning models for indication expansionProject Summary: Bringing a novel therapeutic to market is a time-consuming and expensive endeavor. Many promising candidates identified through virtual screening and preclinical studies fail in clinical trials due to poor efficacy or lack of improvement in the standard of care. Instead, a target-centric approach, based on the repurposing of safe compounds with known mode of action (MoA), offers many advantages. First, alternative indications (diseases) with high medical need and market potential are identified for a given target, by linking information about the MoA, disease state, and patient populations obtained from large public and proprietary datasets. Subsequently, suitable disease models are chosen (e.g. by literature mining), and therapeutics are fast-tracked for approval, thus limiting the risk of failure. The aim of this project is to develop and integrate novel machine learning methods, with emphasis on explainability of the predicted outcome, to improve the overall performance of Boehringer Ingelheim’s drug repurposing pipeline.
Yale-Boehringer Ingelheim Biomedical Data Science Fellow '21
Topic: A bioinformatics journey: from EHR to genetic dataProject Summary: As a postdoc at Yale Center for Biomedical Data Science, Xiayuan is going to work on high-throughput biomedical data, including electronic health records (EHRs) and genetic data. His research will focus on extending state-of-the-art machine learning approaches in health using EHRs, developing machine learning algorithms for drug discovery and adverse drug effects, and applying statistical methods to investigate the challenging problems in genetic data. Based on his PhD research, he believes family history linked EHRs succinctly encompasses shared genetic, epigenetic, and environmental features which enhance the analysis of human disease. He plans to apply machine learning algorithms in healthcare domain, such as disease risk prediction, precision medicine and clinical applications using family history linked EHRs. From the perspective of genetic data, his research work is devoted to addressing challenging problems in single-cell RNA sequencing data, developing innovative statistical models on analyzing the impact of genetic variants in human disease.Yale-Boehringer Ingelheim Biomedical Data Science Fellow '21
Topic: Identify brain functional subnetworks and associated genetic vulnerabilities with comorbid mental disordersProject Summary: Emerging evidence indicates that 1) boundaries of psychiatric illnesses are not sharp with behavioral traits and brain function; 2) individual behavioral differences are linked to variability in functional brain networks. These suggest that the spectra of symptom profiles observed in patients may arise through discernible patterns of functional connectome, with the disturbance of individual systems preferentially contributing to domain-specific, but disorder-general, impairments. Meanwhile, gene transcription could strongly correlate with network topography, potentially driving comorbidity between symptomatically related disorders. Hence, our overarching goal is to identify brain functional network fingerprints, link them to dimensional symptom profiles, and characterize the associated genetic underpinnings through a suite of powerful, biologically plausible and computationally efficient statistical models. Successful completion of this research will discover network-level biomarkers and associated genetic vulnerabilities, and facilitate the development of novel treatments and future classification schemes.Yale-Boehringer Ingelheim Biomedical Data Science Fellow '22
Topic: Multi-omics Analytics and Emerging TechnologiesProject Summary: Graph genome-based models can characterize genetic variation across both microbial organisms and diverse regions of the human genome. We aim to investigate whether these models can also be used to characterize the extensive genetic diversity observed within immunogenetic sequencing datasets (e.g., B cell receptor (BCR) repertoire sequencing). We will develop graph-based approaches to 1) analyze high-throughput immunogenetic sequencing (e.g., BCR repertoire profiling) and 2) perform genetic association tests focused phenotypes related to the host immune response to vaccines, infection, therapeutics developed by Boehringer Ingelheim, and autoimmune diseases. We will also assess whether graph structure/topology is clinically informative and, by annotating regions across the graph using external multi-modal data, assess whether annotated genome graphs can facilitate immunogenetic-focused genome-wide association studies.Yale-Boehringer Ingelheim Biomedical Data Science Fellow '22
Topic: Multimodal network-based cancer heterogeneity analysisProject Summary: In the past decade, the maturity of profiling techniques has led to the discovery that previously defined cancer types/subtypes, which is based on pathological images, can be further classified into sub-subtypes. This refined classification has different omics landscapes and clinical paths and demand different treatment strategies. Accordingly, the first guiding principle of this study is that effectively integrating multimodal data, in particular pathological imaging and multi-omics data, can lead to more refined cancer heterogeneity structures. In heterogeneity analysis, incorporating the interconnections among variables can future reveal more subtle cancer heterogeneity structures. As such, the second guiding principle is that utilizing cutting-edge methods to incorporate interconnections can further improve cancer heterogeneity analysis. Our overarching goal is to develop more effective statistical learning methods for cancer heterogeneity analysis, which can deepen our understanding of cancer biology and facilitate more personalized treatment.Yale-Boehringer Ingelheim Biomedical Data Science Fellow '22
Topic: Multi-omics Analytics and Emerging TechnologiesProject Summary: Recent effort has been made in using CRISPR knockout or activating to screening target to boost T cell effector function and further leverage the immune killing function. However, the manipulating of a single gene might still be hard to overcome the resistance due to genes that can compromise its function in immune cell signaling. Paralogs derived from the same ancestors are reported with synthetic lethal interactions, which might function jointly in augmenting cancer immunity. In this project, we will establish a computational model for predicting paralogs pairs that can team up their function in cancer immunotherapy, by integrating genome-wide CRISPR screens perturbation molecular profiles from Cancer Dependency Map (DepMap) and Connectivity Map (CMap), and cancer datasets with patients receiving immunotherapy. The outcome of this research will deliver in silico tools for screening paralog pairs that can boost immune response, which could inspire effective combination therapeutic strategies toward precision treatment.Yale-Boehringer Ingelheim Biomedical Data Science Fellow '22
Topic: Mechanism-based identification of biomarkers and intervention targets from multi-omics datasets Project Summary: Recent technological advances allowing for the global characterization of genomic variants, transcription profiles, epigenomic profiles, and protein markers, often down to the single-cell level, have provided unprecedented insights into the homeostatic and perturbed states of biological systems. Analyzing these vast multi-omics datasets to obtain clinically actionable biomarkers and promising intervention targets remains a formidable challenge. Prediction of the individual immune response quality and quantity in health and disease is one quintessential case. Our proposed research will combine statistical analyses with causal inference and multi-scale mathematical modeling to develop a multi-omics data analysis pipeline that (a) provides mechanistic insights into the underlying biological process, (b) captures the diversity seen across individuals, and (c) identifies complex features and rules that are predictive of the response to perturbations. We will apply our approach to datasets characterizing the vaccination response to identify predictive biomarkers and intervention targets to improve vaccine efficacy.
Yale Mentors
Associate Professor of Genetics and of Computer Science
Smita Krishnaswamy is an Associate professor in Genetics and Computer Science. She is affiliated with the applied math program, computational biology program, Yale Center for Biomedical Data Science and Yale Cancer Center. Her lab works on the development of machine learning techniques to analyze high dimensional high throughput biomedical data. Her focus is on unsupervised machine learning methods, specifically manifold learning and deep learning techniques for detecting structure and patterns in data. She has developed algorithms for non-linear dimensionality reduction and visualization, learning data geometry, denoising, imputation, inference of multi-granular structure, and inference of feature networks from big data. Her group has applied these techniques to many data types such as single cell RNA-sequencing, mass cytometry, electronic health record, and connectomic data from a variety of systems. Specific application areas include immunology, immunotherapy, cancer, neuroscience, developmental biology and health outcomes. Smita has a Ph.D. in Computer Science and Engineering from the University of Michigan.Associate Professor of Biostatistics
Dr. Wang is Associate professor of Biostatistics at Yale School of Public Health. Her research focuses on combining genetics, genomics, immunology, and statistical modeling to answer biologically important questions in genetic epidemiological studies. Dr. Wang's statistical expertise lies in longitudinal data analysis, varying coefficient models, mixed effects models, kernel machine methods, mediation analysis, machine learning methods, and network analysis. She develops statistically innovative methods and computationally efficient tools in large-scale genetic and genomic studies to identify genetic susceptibility variants and advance the understanding of the etiology of complex diseases including breast cancer, alcohol and drug abuse, asthma, autism, obesity, lung and cardiovascular diseases. Current studies include using next-generation sequencing data to detect rare genetic variants in longitudinal genetic studies, combining knowledge in genomics and immunology to understand the risk of breast cancer survival, addressing statistical challenges in single-cell RNA sequencing data and spatial transcriptomics, and machine learning for risk prediction in electronic health records data.Associate Professor of Biostatistics
Dr. Zhao is an Associate Professor in the Department of Biostatistics at Yale School of Public Health. She is also affiliated with Yale Center for Analytical Sciences, Yale Alzheimer's Disease Research Center and Yale Computational Biology and Bioinformatics. Her main research focuses on the development of statistical and machine learning methods to analyze large-scale complex data (imaging, -omics, EHRs), Bayesian methods, feature selection, predictive modeling, data integration, missing data and network analysis. She has strong interests in biomedical research areas including mental health, mental disorders and aging, etc. Her most recent research agenda includes analytical method development and applications on brain network analyses, multimodal imaging modeling, imaging genetics, and the integration of biomedical data with EHR data. Her research is supported by multiple NIH grants. Dr. Zhao received her Ph.D. in Biostatistics from Emory University and postdoc training at Statistical and Applied Mathematical Sciences Institute (SAMSI) and the University of North Carolina at Chapel Hill. Prior to coming to Yale, she was an Assistant Professor in Biostatistics at Cornell University, Weill Cornell Medicine.Anthony N Brady Professor of Pathology; Co-Director of Graduate Studies, Computational Biology and Bioinformatics
Dr. Steven Kleinstein is a computational immunologist with a combination of big data analysis and immunology domain expertise. His research interests include both developing new computational methods and applying these methods to study human immune responses. Dr. Kleinstein received a B.A.S. in Computer Science from the University of Pennsylvania and a Ph.D. in Computer Science from Princeton University. He is currently Professor of Pathology (with a secondary appointment in Immunobiology) at the Yale School of Medicine, and a member of the Interdepartmental Program in Computational Biology and Bioinformatics (CBB), and the Human and Translational Immunology Program. Specific areas of research focus include:High-throughput single-cell B cell receptor (BCR) repertoire profiling (AIRR-seq, Rep-seq, scRNA-seq+VDJ)Multi-omic immune signatures of human infection and vaccination responsesInterim Department Chair and Professor of Biostatistics; Affiliated Faculty, Yale Institute for Global Health; Director, Biostatistics and Bioinformatics Shared Resource
Dr. Ma received his Ph.D. degree in statistics at University of Wisconsin in 2004. Prior to arriving at Yale, Dr. Ma was a Senior Fellow in Collaborative Health Studies Coordinating Center (CHSCC) and Department of Biostatistics at University of Washington. He has been involved in developing novel statistical and bioinformatics methodologies for analysis of cancer (NHL, breast cancer, melanoma, lung cancer), mental disorders, and cardiovascular diseases. He has also been involved in health economics research, with special interest in health insurance in developing countries.Associate Professor of Genetics
Sidi Chen joined the Yale Faculty in 2015 in the Department of Genetics, Systems Biology Institute, and Yale Cancer Center. His research focuses on providing a global understanding of biological systems and development of novel breakthrough therapeutics. Chen developed and applied genome editing and high-throughput screening technologies, precision CRISPR-based in vivo models of cancer, global mapping of functional drivers of cancer oncogenesis and metastasis. He is leading a research group to seek global understandings of the molecular and cellular factors controlling disease progression and immunity. His group continuously invents versatile systems that enable rapid identification of novel targets and development of new modalities of cancer immunotherapy, cell therapy and gene therapy. His goal is to uncover novel insights in cancer and various other immunological diseases and develop next generation therapeutics. Dr. Chen received a number of national and international awards including the Pershing Square Sohn Prize, DoD Era of Hope Scholar, NIH Director’s New Innovator Award, Blavatnik Innovator Award, Yale Cancer Center Basic Science Research Prize, AACR NextGen Award for Transformative Cancer Research, Ludwig Foundation Award, Damon Runyon Cancer Research Fellow, Dale Frey Award for Breakthrough Scientists, TMKF Innovative/Translation Cancer Research Award, BCA Exceptional Research Grant Award, MRA Young Investigator Award, V Scholar, Bohmfalk Scholar, Ludwig Family Foundation Award, St. Baldrick’s Foundation Award, CRI Clinic & Laboratory Integration Program (CLIP), MIT Technology Review Top 35 Innovators (Regional), and Sontag Foundation Distinguished Scientist Award.Professor of Immunobiology and Biomedical Engineering; Director, Yale Center for Systems and Engineering Immunology (CSEI)
John Tsang is a systems immunologist, computational biologist, and engineer. Tsang earned his PhD in biophysics and systems biology from Harvard University and trained in computer engineering (BASc) and computer science (MMath) at the University of Waterloo, Canada. He is currently Professor of Immunobiology and Biomedical Engineering at Yale University; he is also the founding Director of the Yale Center for Systems and Engineering Immunology (CSEI), which serves as a home and cross-departmental center of research for systems, quantitative, and synthetic immunology at Yale. Prior to joining Yale, he was a tenured Senior Investigator in the National Institutes of Health's Intramural Research Program and led a laboratory focusing on systems and quantitative immunology at the National Institute of Allergy and Infectious Diseases (NIAID). He was the Co-Director of the Trans-NIH Center for Human Immunology (CHI) and led its research program in systems human immunology. He remains an Adjunct Investigator at NIAID. He has won multiple awards for his research, including several NIH/NIAID Merit Awards recognizing his scientific leadership in systems immunology, COVID-19, and human immunology research. His work on human immune variability, systems immunology, and prediction of vaccination responses was selected as a Top NIAID Research Advance of 2014. Tsang has served as an advisor on systems immunology and computational biology for numerous programs and organizations, including the Allen Institute, World Allergy Organization, National Cancer Institute, National Institute of Allergy and Infectious Diseases, National Institute of Diabetes and Digestive and Kidney Diseases, and the Fred Hutchinson Cancer Center. He currently serves on the Editorial Board of PLOS Biology and the Scientific Advisory Board of NIAID ImmPort, the NIAID Influenza IMPRINT Program, the NIH Common Fund Cellular Senescence Network (SenNet), Vaccine and Immunology Statistical Center of the Gates Foundation, the Human Immunome Project, and CytoReason Ltd.
Boehringer Ingelheim Mentors
Scientist, CB3 Team in Computational Biology and Digital Sciences
Sergio has worked in Boehringer Ingelheim since May 2019 as a data scientist. His work mainly comprised the integration and interpretation of multi-omic datasets and the assessment of preclinical models and treatments in fibrotic diseases. More recently, Sergio moved to the CB3 team to focus on data integration and harmonisation initiatives, both from the conceptual and practical points of view.His background is Mathematics and Industrial Engineering, under the inter-disciplinary CFIS program at the Polytechnic University of Catalonia. He then engaged a PhD in Biomedical Engineering in the same university, where he explored the application of network propagation algorithms to computational biology tasks such as the interpretation of metabolomics experiments and the prediction of new disease- and pathway-related genes. Specifically, Sergio studied the biases in propagation methods and the consequences of their removal.Links:https://www.linkedin.com/in/sergi-picart-armada-b6017b97Sergio Picart-Armada - Google ScholarGlobal Computational Biology and Digital Sciences
Gregorio Alanis-Lobato obtained his MSc in Computer Science and his PhD in Computational Biology at KAUST developing methods to predict protein-protein interactions based on the topological structure of experimentally-derived networks. Then, he moved to the Johannes Gutenberg University in Mainz, Germany to continue his research on this topic as a postdoctoral fellow and to enhance the functionality of HIPPIE, a webtool to construct reliable and context-specific human protein networks. This was followed by a second postdoc at the Francis Crick Institute in London, where he worked on the integration of different single-cell omics data modalities for the construction of gene regulatory networks in early human embryos. In addition, he developed computational pipelines to assess whether CRISPR-Cas9-targeted human preimplantation embryos had unintended on-target mutations based on single-cell genomics and transcriptomics datasets. Gregorio joined BI in late 2020 to support pre-clinical research for the CNS Diseases and Research Beyond Borders therapeutic areas with his expertise in omics data analysis and integration.LinkedIn: https://www.linkedin.com/in/greg-al/ Google Scholar: https://scholar.google.com/citations?user=nAGRhSEAAAAJ&hl=enSenior Principal Scientist, Global Computational Biology and Digital Sciences
Dr. Frank Li is a Senior Principal Scientist at Department of Global Computational Biology and Digital Sciences, Boehringer Ingelheim, Ridgefield, CT. In this role, he oversees bioinformatics analyses using systems biology approaches to better understand and/or characterize the pathological mechanisms contributing to the onset and progression of autoimmune diseases, including Inflammatory Bowel Disease, Systemic Sclerosis, Idiopathic Pulmonary Fibrosis, etc. His group is using holistic “omics” approaches to facilitate discoveries of novel therapeutic concepts, determination of novel biomarkers, and patient stratification and enrichment supporting clinical drug development. He obtained his Ph.D. from the University of North Carolina at Chapel Hill with his doctoral research focus on the breakdown of immunological tolerance of autoreactive CD4+ T cells in autoimmune Type I diabetes. Then, he continued his postdoctoral studies at Harvard Medical School where he worked in the fields of T cell tolerance attempting to decipher the molecular and cellular mechanisms that control T cell differentiation, and the molecular mechanisms controlling the plasticity of FOXP3+ Treg cells using both canonical immunological and systems biology approaches. Linkedin: https://www.linkedin.com/in/frank-l-li-479b9a6/Head of Human Genetics
Dr. Zhihao Ding was trained in his PhD at the Wellcome Trust Sanger Institute, the University of Cambridge, studying the genetics of cellular traits and genomic algorithms. He had a Postdoc at the Wellcome Trust Centre for Human Genetics, University of Oxford, studying the genetics of rare diseases and cancers. In 2015, he transitioned to industry working at Genomics PLC, Oxford, UK, where he developed algorithmic solutions and products for rare disease diagnostics. He led work packages for a panel of pharmaceutical companies in evaluating specific targets in disease areas of therapeutic interests. Before joining Boehringer Ingelheim (BI), Dr. Zhihao was leading target projects on NASH at Novo Nordisk Research Centre Oxford (NNRCO). Dr. Zhihao was one of the first genetic scientists joined the NNRCO, where he helped build the genetic capacity and initiated several academic research collaborations with the Big Data Institute, University of Oxford. Dr. Zhihao joined BI as the Head of Human Genetics in July 2020, where he’s leading genetic initiatives in gCBDS.LinkedIn: https://www.linkedin.com/in/zhihao-ding-oxford/Google Scholar link: https://scholar.google.com/citations?user=nbIFJRAAAAAJ&hl=enComputational Biology Expert Lead
Dr. Di Feng is a senior member of GCBDS (Global Computational Biology and Digital Science), working at Boehringer Ingelheim's US headquarters in Ridgefield, CT on computational drug discovery research. He is a Computational Biology professional with substantial multidisciplinary expertise in Computational Immunology, Pathology, and Machine Intelligence applications for drug discovery. Dr. Feng managed the Artificial Intelligence and Machine Learning partnership with The Center of Computational Imaging and Personalized Diagnostics (CCIPD) at Case Western's University Hospital, the Cleveland Medical Center. Dr. Feng has also contributed open source software tools such as Single Cell Explorer, a platform to facilitate the collaboration between computational biologists and experimental scientists. Dr. Feng led computational projects for a small molecule and biological drug program from early research to clinical trials. He has worked with and led teams to solve complex research challenges using computational approaches across multiple therapeutic areas such as Cancer Immunology, Immunomodulation, Immunology and Respiratory, and Cardiometabolic diseases. He received Ph.D. from Rutgers - Graduate School of Biomedical Sciences, studying basic and clinical biology of plasmacytoid dendritic cells, followed by postdoc research on autoimmune and cancer susceptibility genes with integrating bioinformatics with wet lab science. He also earned his medical degree from Shanghai Jiao Tong University School of Medicine. Prior to joining Boehringer Ingelheim, he developed therapeutics supported by Lupus Research Alliance. LinkedIn: https://www.linkedin.com/in/di-feng-23310810/Team Lead Statistical Modeling
Johann de Jong obtained his PhD in Computational Cancer Biology from Delft University of Technology. Since then, he has gained experience in a wide variety of domains, ranging from gene regulation and chromatin biology to cancer research and neurological disorders, in both academia (the Netherlands Cancer Institute) and industry (BASF, UCB Pharma). He currently leads the Statistical Modeling team within Human Genetics at Boehringer Ingelheim, which focuses on developing and applying novel machine learning and statistical models for biomarker/target identification and patient stratification by integrating prior knowledge with multi-modal and longitudinal data sources including human biobanks.LinkedIn: https://www.linkedin.com/in/johann-de-jong-a0b01624/ Google Scholar: https://scholar.google.nl/citations?user=c5IUr7QAAAAJ&hl=enPrincipal Scientist, Global Computational Biology and Digital Sciences
Zuojian Tang is a principle scientist of Global Computational Biology and Data Sciences at Boehringer Ingelheim (BI) with extensive experience alongside both computational biology and bioinformatics engineering. Prior to joining BI, she worked as bioinformatics engineer for Memorial Sloan Kettering Cancer Center. She also spent about 10 years with New York University Langone Health as senior research scientist. Zuojian has designed and developed widely recognized and adopted analysis methods and systems for various computational biological applications. She has more than 35 peer reviewed full-length papers published with more than 3000 citations. Zuojian received her Ph.D. of Systems and Computational Biomedicine from New York University, Master of Computer Science from McGill University, Canada, and Bachelor of Engineering in China.LinkedIn Google ScholarPrincipal Scientist and CIIM Partner, Global Computational Biology and Digital Sciences
Dr. Fahmy is a Principal Scientist and GCBDS Partner for the therapeutic area (TA) of cancer immunology and immune modulation (CIIM) at Department of Global Computational Biology and Digital Sciences (GCBDS), Boehringer Ingelheim (BI), Biberach, Germany. In his role, he oversees the computational biology analysis to drive the discovery research process for CIIM portfolio and to execute the TA strategy for identifying new targets and novel therapeutic concepts. Before joining BI, Dr. Fahmy was the head of the Integrative OMICs Analysis group at Rostock University Medical Center, Rostock, Germany. His research interests include biological data integration, developing integrative bioinformatics methods for precision medicine, systems biology, drug combination predictions, Radiogenomics and cancer genomics. Dr. Fahmy received his BSc degree in systems and biomedical engineering with highest honors from Cairo University before getting a full grant to obtain an IT diploma in the field of software engineering from the Information Technology Institute (ITI) in Egypt. Dr. Fahmy worked for ESRINEA as a senior team leader focusing on developing and integrating enterprise IT solutions. He received his MSc degree from the school of computer science in Nottingham University, UK and his PhD degree in computer science (specialization: bioinformatics) from Saarland University, Saarbrucken, Germany. He was a visiting research associate in international research centres such as National Institute of Informatics (NII) in Tokyo, Japan and the Centre for Bioinformatics (CBI) in Saarbrucken, Germany as well as the Royal Perth Hospital, University of west Australia in Perth, Australia.LinkedInGoogle Scholar