Skip to Main Content

Hongyu Zhao, PhD

Ira V. Hiscock Professor of Biostatistics, Professor of Genetics and Professor of Statistics and Data Science; Affiliated Faculty, Yale Institute for Global Health

Contact Information

Hongyu Zhao, PhD

Office Location

Research Summary

Our research is driven by the need to analyze and interpret large and complex data sets in biomedical research. For example, in genome wide association studies involving thousands to hundreds of thousands of individuals, millions of DNA variants are analyzed for each person. Such data offer researchers the opportunity to identify genes and variants affecting disease susceptibility and develop risk prediction models to facilitate disease prevention, monitoring, and treatment. There are many statistical challenges arising from the analysis of such data, including the very high dimensionality of the markers, the relatively weak signals, and the need to incorporate prior knowledge and other data sets in analysis. Other examples include the analyses of next generation sequence data, single cell data, image data, microbiome data, wearable device data, and electronic medical records, which present even greater statistical and computational challenges. Our group has been developing statistical methods to address these challenges, such as empirical Bayes methods to borrow information across different data sets, different generalizations of Gaussian graphical models for network inference, Markov random field models for spatial and temporal modeling, and general machine learning methods for high dimensional data.

Specialized Terms: Statistical genomics and proteomics; Bioinformatics; Data integration; High dimensional data; Network and graphical models; Disease risk prediction; Microbiome; Cancer genomics; Single cell analysis; Imaging genetics; Wearable device; Electronic medical records

Extensive Research Description

  • Genome Wide Association Studies: We are developing statistical methods to integrate diverse data types and prior biological knowledge to identify genes and variants for common diseases and risk prediction models. We also develop methods to infer the genetic architecture of complex diseases and for risk predictions.
  • Single Cell Analysis: We are developing statistically robust and computationally efficient methods for single cell data with the objectives of inferring genetic regulation and signaling at the single cell level, and the identifications of cellular changes across different conditions.
  • Network Modeling: We are developing statistical methods to model biological networks under the general framework of Gaussian and other graphical models. Specific networks we are working on include gene expression regulatory networks, signaling networks, and eQTL networks.
  • Imaging Genetics: We focus on the analysis of data from several consortia to infer the impacts of genetic factors on imaging traits, as well as their associations with complex diseases.
  • Wearable Device: We are developing methods to extract signals from wearable devices and then combine them with genetics data to infer the genetic basis of activity and sleeping traits.
  • Cancer Genomics: We are developing statistical and computational methods to analyze cancer genomics data, e.g. microarrays and next generation sequencing, to identify cancer subtypes, driver mutations, biomarkers, and appropriate treatments for cancer patients.
  • Microbiome Analysis: We are developing modeling and analysis approaches for microbiome data generated from next generation sequencing data.
  • Proteomics: Our current focus is on targeted proteomics, such as Multiple Reaction Monitoring.


Research Interests

Genetics; Public Health; Computational Biology; Statistics; Genomics; Proteomics; Biostatistics; Single-Cell Analysis; Microbiota; Wearable Electronic Devices

Public Health Interests

Cancer; Genetics, Genomics, Epigenetics; Global Health; Infectious Diseases

Selected Publications