Hongyu Zhao, PhD
Research & Publications
Biography
News
Locations
Research Summary
Our research is driven by the need to analyze and interpret large and complex data sets in biomedical research. For example, in genome wide association studies involving thousands to hundreds of thousands of individuals, millions of DNA variants are analyzed for each person. Such data offer researchers the opportunity to identify genes and variants affecting disease susceptibility and develop risk prediction models to facilitate disease prevention, monitoring, and treatment. There are many statistical challenges arising from the analysis of such data, including the very high dimensionality of the markers, the relatively weak signals, and the need to incorporate prior knowledge and other data sets in analysis. Other examples include the analyses of next generation sequence data, single cell data, image data, microbiome data, wearable device data, and electronic medical records, which present even greater statistical and computational challenges. Our group has been developing statistical methods to address these challenges, such as empirical Bayes methods to borrow information across different data sets, different generalizations of Gaussian graphical models for network inference, Markov random field models for spatial and temporal modeling, and general machine learning methods for high dimensional data.
Specialized Terms: Statistical genomics and proteomics; Bioinformatics; Data integration; High dimensional data; Network and graphical models; Disease risk prediction; Microbiome; Cancer genomics; Single cell analysis; Imaging genetics; Wearable device; Electronic medical records
Extensive Research Description
- Genome Wide Association Studies: We are developing statistical methods to integrate diverse data types and prior biological knowledge to identify genes and variants for common diseases and risk prediction models. We also develop methods to infer the genetic architecture of complex diseases and for risk predictions.
- Single Cell Analysis: We are developing statistically robust and computationally efficient methods for single cell data with the objectives of inferring genetic regulation and signaling at the single cell level, and the identifications of cellular changes across different conditions.
- Network Modeling: We are developing statistical methods to model biological networks under the general framework of Gaussian and other graphical models. Specific networks we are working on include gene expression regulatory networks, signaling networks, and eQTL networks.
- Imaging Genetics: We focus on the analysis of data from several consortia to infer the impacts of genetics on imaging traits.
- Wearable Device: We are developing methods to extract signals from wearable devices and then combine them with genetics data to infer the genetic basis of activity and sleeping traits.
- Cancer Genomics: We are developing statistical and computational methods to analyze cancer genomics data, e.g. microarrays and next generation sequencing, to identify cancer subtypes, driver mutations, and appropriate treatments for cancer patients.
- Microbiome Analysis: We are developing modeling and analysis approaches for microbiome data generated from next generation sequencing data.
- Proteomics: Our current focus is on targeted proteomics, such as Multiple Reaction Monitoring.
Coauthors
Research Interests
Genetics; Public Health; Computational Biology; Statistics; Genomics; Proteomics; Biostatistics; Single-Cell Analysis; Microbiota; Wearable Electronic Devices
Public Health Interests
Genetics, Genomics, Epigenetics
Selected Publications
- NITUMID: Nonnegative matrix factorization-based Immune-TUmor MIcroenvironment Deconvolution.Tang D, Park S, Zhao H. NITUMID: Nonnegative matrix factorization-based Immune-TUmor MIcroenvironment Deconvolution. Bioinformatics (Oxford, England) 2020, 36:1344-1350.
- A statistical framework for cross-tissue transcriptome-wide association analysis.Hu Y, Li M, Lu Q, Weng H, Wang J, Zekavat SM, Yu Z, Li B, Gu J, Muchnik S, Shi Y, Kunkle BW, Mukherjee S, Natarajan P, Naj A, Kuzma A, Zhao Y, Crane PK, Lu H, Zhao H. A statistical framework for cross-tissue transcriptome-wide association analysis. Nature Genetics 2019, 51:568-576.
- Prediction analysis for microbiome sequencing data.Wang T, Yang C, Zhao H. Prediction analysis for microbiome sequencing data. Biometrics 2019, 75:875-884.
- Spectral clustering based on learning similarity matrix.Park S, Zhao H, Birol I. Spectral clustering based on learning similarity matrix. Bioinformatics (Oxford, England) 2018, 34:2069-2076.
- Spatiotemporal transcriptomic divergence across human and macaque brain development.Zhu Y, Sousa AMM, Gao T, Skarica M, Li M, Santpere G, Esteller-Cucala P, Juan D, Ferrández-Peral L, Gulden FO, Yang M, Miller DJ, Marques-Bonet T, Imamura Kawasawa Y, Zhao H, Sestan N. Spatiotemporal transcriptomic divergence across human and macaque brain development. Science (New York, N.Y.) 2018, 362.
- Systematic tissue-specific functional annotation of the human genome highlights immune-related DNA elements for late-onset Alzheimer's disease.Lu Q, Powles RL, Abdallah S, Ou D, Wang Q, Hu Y, Lu Y, Liu W, Li B, Mukherjee S, Crane PK, Zhao H. Systematic tissue-specific functional annotation of the human genome highlights immune-related DNA elements for late-onset Alzheimer's disease. PLoS Genetics 2017, 13:e1006933.
- Leveraging functional annotations in genetic risk prediction for human complex diseases.Hu Y, Lu Q, Powles R, Yao X, Yang C, Fang F, Xu X, Zhao H. Leveraging functional annotations in genetic risk prediction for human complex diseases. PLoS Computational Biology 2017, 13:e1005589.
- A Dirichlet-tree multinomial regression model for associating dietary nutrients with gut microorganisms.Wang T, Zhao H. A Dirichlet-tree multinomial regression model for associating dietary nutrients with gut microorganisms. Biometrics 2017, 73:792-801.
- A Powerful Approach to Estimating Annotation-Stratified Genetic Covariance via GWAS Summary Statistics.Lu Q, Li B, Ou D, Erlendsdottir M, Powles RL, Jiang T, Hu Y, Chang D, Jin C, Dai W, He Q, Liu Z, Mukherjee S, Crane PK, Zhao H. A Powerful Approach to Estimating Annotation-Stratified Genetic Covariance via GWAS Summary Statistics. American Journal Of Human Genetics 2017, 101:939-964.
- Joint modeling of genetically correlated diseases and functional annotations increases accuracy of polygenic risk prediction.Hu Y, Lu Q, Liu W, Zhang Y, Li M, Zhao H. Joint modeling of genetically correlated diseases and functional annotations increases accuracy of polygenic risk prediction. PLoS Genetics 2017, 13:e1006836.
- On joint estimation of Gaussian graphical models for spatial and temporal data.Lin Z, Wang T, Yang C, Zhao H. On joint estimation of Gaussian graphical models for spatial and temporal data. Biometrics 2017, 73:769-779.
- On high-dimensional misspecified mixed model analysis in genome-wide association studyJ. Jiang, C. Li, D. Paul, C. Yang, H. Zhao (2016) On high-dimensional misspecified mixed model analysis in genome-wide association study. Annals of Statistics, 44: 2127–2160.
- Integrative Tissue-Specific Functional Annotations in the Human Genome Provide Novel Insights on Many Complex Traits and Improve Signal Prioritization in Genome Wide Association Studies.Lu Q, Powles RL, Wang Q, He BJ, Zhao H. Integrative Tissue-Specific Functional Annotations in the Human Genome Provide Novel Insights on Many Complex Traits and Improve Signal Prioritization in Genome Wide Association Studies. PLoS Genetics 2016, 12:e1005947.
- CCor: A whole genome network-based similarity measure between two genes.Hu Y, Zhao H. CCor: A whole genome network-based similarity measure between two genes. Biometrics 2016, 72:1216-1225.
- On an additive partial correlation operator and nonparametric estimation of graphical models.Lee KY, Li B, Zhao H. On an additive partial correlation operator and nonparametric estimation of graphical models. Biometrika 2016, 103:513-530.
- Pervasive pleiotropy between psychiatric disorders and immune disorders revealed by integrative analysis of multiple GWAS.Wang Q, Yang C, Gelernter J, Zhao H. Pervasive pleiotropy between psychiatric disorders and immune disorders revealed by integrative analysis of multiple GWAS. Human Genetics 2015, 134:1195-209.
- A MARKOV RANDOM FIELD-BASED APPROACH TO CHARACTERIZING HUMAN BRAIN DEVELOPMENT USING SPATIAL-TEMPORAL TRANSCRIPTOME DATA.Lin Z, Sanders SJ, Li M, Sestan N, State MW, Zhao H. A MARKOV RANDOM FIELD-BASED APPROACH TO CHARACTERIZING HUMAN BRAIN DEVELOPMENT USING SPATIAL-TEMPORAL TRANSCRIPTOME DATA. The Annals Of Applied Statistics 2015, 9:429-451.
- A statistical framework to predict functional non-coding regions in the human genome through integrated analysis of annotation data.Lu Q, Hu Y, Sun J, Cheng Y, Cheung KH, Zhao H. A statistical framework to predict functional non-coding regions in the human genome through integrated analysis of annotation data. Scientific Reports 2015, 5:10576.
- GPA: a statistical approach to prioritizing GWAS results by integrating pleiotropy and annotation.Chung D, Yang C, Li C, Gelernter J, Zhao H. GPA: a statistical approach to prioritizing GWAS results by integrating pleiotropy and annotation. PLoS Genetics 2014, 10:e1004787.
- On an Additive Semigraphoid Model for Statistical Networks With Application to Pathway Analysis.Li B, Chun H, Zhao H. On an Additive Semigraphoid Model for Statistical Networks With Application to Pathway Analysis. Journal Of The American Statistical Association 2014, 109:1188-1204.
- Improving genetic risk prediction by leveraging pleiotropy.Li C, Yang C, Gelernter J, Zhao H. Improving genetic risk prediction by leveraging pleiotropy. Human Genetics 2014, 133:639-50.
- Guilt by rewiring: gene prioritization through network rewiring in genome wide association studies.Hou L, Chen M, Zhang CK, Cho J, Zhao H. Guilt by rewiring: gene prioritization through network rewiring in genome wide association studies. Human Molecular Genetics 2014, 23:2780-90.
- Sparse principal component analysis by choice of norm.Qi X, Luo R, Zhao H. Sparse principal component analysis by choice of norm. Journal Of Multivariate Analysis 2013, 114:127-160.
- Sparse Estimation of Conditional Graphical Models With Application to Gene Networks.Li B, Chuns H, Zhao H. Sparse Estimation of Conditional Graphical Models With Application to Gene Networks. Journal Of The American Statistical Association 2012, 107:152-167.
- iFad: an integrative factor analysis model for drug-pathway association inference.Ma H, Zhao H. iFad: an integrative factor analysis model for drug-pathway association inference. Bioinformatics (Oxford, England) 2012, 28:1911-8.
- Incorporating biological pathways via a Markov random field model in genome-wide association studies.Chen M, Cho J, Zhao H. Incorporating biological pathways via a Markov random field model in genome-wide association studies. PLoS Genetics 2011, 7:e1001353.
- BAYESIAN HIERARCHICAL MODELING FOR SIGNALING PATHWAY INFERENCE FROM SINGLE CELL INTERVENTIONAL DATA.Luo R, Zhao H. BAYESIAN HIERARCHICAL MODELING FOR SIGNALING PATHWAY INFERENCE FROM SINGLE CELL INTERVENTIONAL DATA. The Annals Of Applied Statistics 2011, 5:725-745.
Clinical Trials
Conditions | Study Title |
---|---|
Children's Health; Diabetes Mellitus - Type 2; Diseases of the Endocrine System | Pathogenesis of Youth and Type 2 Diabetes and Prediabetes |