Data Interpretation, Statistical; Lung Diseases; Models, Statistical; Biostatistics; Data Mining
Public Health Interests
Genetic epidemiology; Statistical genetics
My main research interests, for both methodological and applied works, focus on two types of problems: (1) to use network/systems biology based methods to solve biological problems (2) to integrate different data types/sources in statistical genetics/systems biology. Most of my works address a combination of these two types of problems. In recent years network models of genomic data have been frequently applied for characterization of complex interactions among genes and phenotypes. Specifically, the regulatory systems of a cell have been effectively described by networks, providing better understanding of the molecular processes underlying cellular function, and ultimately improving our understanding of disease pathogenesis. One aspect of my contribution in this field is to have developed novel methodologies based on Gaussian graphical model (GGM) and applied the methods on multiple data types.
Extensive Research Description
SNP-gene network analysis: I developed a multistep approach to infer a gene-SNP network from gene expression and genotyped SNP data. The approach consists of 4 steps: (1) construction of a graphical Gaussian model (GGM) based on small sample estimation of partial correlation and false-discovery rate multiple testing; (2) extraction of a subnetwork of genes directly linked to a target candidate gene of interest; (3) identification of cis-acting regulatory variants for the genes composing the sub-network; and (4) evaluating the identified cis-acting variants for trans-acting regulatory effects of the target candidate gene. In an application of this method, we focused on two biologic candidate genes in asthma pathogenesis, Interleukin 12 receptor, beta 2 (IL12RB2) and Interleukin 1B (IL1B), and built complex gene-SNP networks around them using the genotyped variants and gene expression data in a childhood asthma cohort. After FDR adjustment we identified 225 SNP-gene pairs with significant association working through IL12RB2 (suggesting trans-eQTL), and 353 SNP-gene pairs for IL1B. We were also able to a significant part of the network from two other independent data sets, demonstrating the reproducibility of our network building.
Quantifying differential network connectivity:I developed a novel approach for inferring the associations in gene-gene interaction (epistasis) networks across disease states based on GGM. We compared the posterior probabilities of connectivity for each gene pair across two disease states, expressed as a posterior odds-ratio (postOR) for each pair, which can be used to identify network components most relevant to disease status. This method represents one of the few that can objectively quantify the differences in coexpression between two states (i.e. cases vs. controls, treated vs. untreated, etc) within a formalized statistical framework. We applied the method on two independent gene expression data sets from breast cancer tissues of varying histological grade. We compared the network connectivity patterns observed across breast cancers of different histological grades in these two data sets, and found significant overlap across the studies. A significant number of hub genes that we identified had also been previously linked to breast cancer, suggesting that differential connectivity mapping is exquisitely specific in the identification of biologically relevant genes.
Phenotypic Networks: We also applied the GGM to analyze the relationships among multiple disease-related phenotypes. We applied this method to two large, well-characterized studies of chronic obstructive pulmonary disease (COPD). We also examined the associations between these COPD phenotypic networks and other factors, including case-control status, disease severity, and genetic variants. Using these phenotypic networks, we have detected novel relationships between phenotypes that would not have been observed using traditional epidemiological approaches. For example, higher emphysema was associated with higher BMI in the control group but was associated with lower BMI in the case group, and both associations were statistically significant.Since severe emphysema can lead to cachexia, the association of higher emphysema with lower BMI among COPD cases is consistent with clinical experience. Therefore, we believe that phenotypic network analysis of complex diseases could provide novel insights into disease susceptibility, disease severity, and genetic mechanisms.
Other collaborative work: In additional to my methodological work, I have also been involved in genetic data analysis for many projects, serving as co-investigator in several NIH-funded grants, including eQTL analysis for the Childhood Asthma Management Program (CAMP), lung development, pharmacogenetics, and copy number variant analysis. For each project I actively look to bring fresh ideas and new research directions to the analysis, incorporating novel data-mining methods that had not been routinely used in genetic data analysis, including LASSO regression, Random Forest, and Bayesian mixture models.
Future Work: The methods that I developed are very versatile and have the potential to be applied on a variety of different biological problems and data types. For example, my methods could be extended to incorporate next generation sequencing (NGS) data, or other genomic elements such as micro RNA, which will become more commonly available in the future. There are also other data types, such as copy number variations and methylation, that could also be incorporated in an integrated model of network complexity. I am also interested in extending my methodological work on different type of biological problems, including the identification of disease-relevant regulatory network, disease subtyping, and network pharmacology.
Analyzing networks of phenotypes in complex diseases: methodology and applications in COPD.
Chu JH, Hersh CP, Castaldi PJ, Cho MH, Raby BA, et al. BMC systems biology. 2014; 8:78.
Generic Feature Selection with Short Fat Data.
Clarke B, Chu JH. Journal of the Indian Society of Agricultural Statistics. Indian Society of Agricultural Statistics. 2014; 68(2):145-162. NIHMSID: NIHMS619926
Copy number variation genotyping using family information.
Chu JH, Rogers A, Ionita-Laza I, Darvishi K, Mills RE, et al. BMC bioinformatics. 2013; 14:157.
Quantifying differential gene connectivity between disease states for objective identification of disease-relevant genes.
Chu JH, Lazarus R, Carey VJ, Raby BA. BMC systems biology. 2011; 5:89.
A graphical model approach for inferring large-scale networks integrating gene expression and genetic polymorphism.
Chu JH, Weiss ST, Carey VJ, Raby BA. BMC systems biology. 2009; 3:55.
- Bayesian Function Estimation with Overcomplete Wavelet Dictionary. Chu J, Clyde MA, Liang F. Statistica Sinica, 19, 1419-1438, 2009
Noninvasive analysis of the sputum transcriptome discriminates clinical phenotypes of asthma.
Yan X, Chu JH, Gomez J, Koenigs M, Holm C, et al. American journal of respiratory and critical care medicine. 2015; 191(10):1116-25.
Circadian rhythm reprogramming during lung inflammation.
Haspel JA, Chettimada S, Shaik RS, Chu JH, Raby BA, et al. Nature communications. 2014; 5:4753. NIHMSID: NIHMS615497
Copy number variation prevalence in known asthma genes and their impact on asthma susceptibility.
Rogers AJ, Chu JH, Darvishi K, Ionita-Laza I, Lehmann H, et al. Clinical and experimental allergy : journal of the British Society for Allergy and Clinical Immunology. 2013; 43(4):455-62.
The CD4+ T-cell transcriptome and serum IgE in asthma: IL17RB and the role of sex.
Hunninghake GM, Chu JH, Sharma SS, Cho MH, Himes BE, et al. BMC pulmonary medicine. 2011; 11:17.
Germline variants and advanced colorectal adenomas: adenoma prevention with celecoxib trial genome-wide association study.
Wang J, Carvajal-Carmona LG, Chu JH, Zauber AG, Kubo M, et al. Clinical cancer research : an official journal of the American Association for Cancer Research. 2013; 19(23):6430-7. NIHMSID: NIHMS588437
Genome Wide Association Study to predict severe asthma exacerbations in children using random forests classifiers.
Xu M, Tantisira KG, Wu A, Litonjua AA, Chu JH, et al. BMC medical genetics. 2011; 12:90.
On the genome-wide analysis of copy number variants in family-based designs: methods for combining family-based and population-based information for testing dichotomous or quantitative traits, or completely ascertained samples.
Murphy A, Won S, Rogers A, Chu JH, Raby BA, et al. Genetic epidemiology. 2010; 34(6):582-90. NIHMSID: NIHMS169453
Mapping of numerous disease-associated expression polymorphisms in primary peripheral blood CD4+ lymphocytes.
Murphy A, Chu JH, Xu M, Carey VJ, Lazarus R, et al. Human molecular genetics. 2010; 19(23):4745-57.
Full List of PubMed Publications
- Tzouvelekis A, Herazo-Maya JD, Slade M, Chu JH, Deiuliis G, Ryu C, Li Q, Sakamoto K, Ibarra G, Pan H, Gulati M, Antin-Ozerkis D, Herzog EL, Kaminski N: Validation of the prognostic value of MMP-7 in idiopathic pulmonary fibrosis. Respirology. 2017 Apr; 2016 Oct 19. PMID: 27761978
- Chu JH, Hart JE, Chhabra D, Garshick E, Raby BA, Laden F: Gene expression network analyses in response to air pollution exposures in the trucking industry. Environ Health. 2016 Nov 3; 2016 Nov 3. PMID: 27809917
- Wellman TJ, de Prost N, Tucci M, Winkler T, Baron RM, Filipczak P, Raby B, Chu JH, Harris RS, Musch G, Dos Reis Falcao LF, Capelozzi V, Venegas JG, Vidal Melo MF: Lung Metabolic Activation as an Early Biomarker of Acute Respiratory Distress Syndrome and Local Gene Expression Heterogeneity. Anesthesiology. 2016 Nov. PMID: 27611185
- Yan X, Chu JH, Gomez J, Koenigs M, Holm C, He X, Perez MF, Zhao H, Mane S, Martinez FD, Ober C, Nicolae DL, Barnes KC, London SJ, Gilliland F, Weiss ST, Raby BA, Cohn L, Chupp GL: Noninvasive Analysis of the Sputum Transcriptome Discriminates Clinical Phenotypes of Asthma. Ann Am Thorac Soc. 2016 Mar. PMID: 27027945
- Yan X, Chu JH, Gomez J, Koenigs M, Holm C, He X, Perez MF, Zhao H, Mane S, Martinez FD, Ober C, Nicolae DL, Barnes KC, London SJ, Gilliland F, Weiss ST, Raby BA, Cohn L, Chupp GL: Noninvasive analysis of the sputum transcriptome discriminates clinical phenotypes of asthma. Am J Respir Crit Care Med. 2015 May 15. PMID: 25763605
- Haspel JA, Chettimada S, Shaik RS, Chu JH, Raby BA, Cernadas M, Carey V, Process V, Hunninghake GM, Ifedigbo E, Lederer JA, Englert J, Pelton A, Coronata A, Fredenburgh LE, Choi AM: Circadian rhythm reprogramming during lung inflammation. Nat Commun. 2014 Sep 11; 2014 Sep 11. PMID: 25208554
- Chu JH, Hersh CP, Castaldi PJ, Cho MH, Raby BA, Laird N, Bowler R, Rennard S, Loscalzo J, Quackenbush J, Silverman EK: Analyzing networks of phenotypes in complex diseases: methodology and applications in COPD. BMC Syst Biol. 2014 Jun 25; 2014 Jun 25. PMID: 24964944
- Wang J, Carvajal-Carmona LG, Chu JH, Zauber AG, APC Trial Collaborators., Kubo M, Matsuda K, Dunlop M, Houlston RS, Sieber O, Lipton L, Gibbs P, Martin NG, Montgomery GW, Young J, Baird PN, Ratain MJ, Nakamura Y, Weiss ST, Tomlinson I, Bertagnolli MM: Germline variants and advanced colorectal adenomas: adenoma prevention with celecoxib trial genome-wide association study. Clin Cancer Res. 2013 Dec 1; 2013 Oct 1. PMID: 24084763
- Chu JH, Rogers A, Ionita-Laza I, Darvishi K, Mills RE, Lee C, Raby BA: Copy number variation genotyping using family information. BMC Bioinformatics. 2013 May 9; 2013 May 9. PMID: 23656838
- Xu M, Tantisira KG, Wu A, Litonjua AA, Chu JH, Himes BE, Damask A, Weiss ST: Genome Wide Association Study to predict severe asthma exacerbations in children using random forests classifiers. BMC Med Genet. 2011 Jun 30; 2011 Jun 30. PMID: 21718536
- Chu JH, Lazarus R, Carey VJ, Raby BA: Quantifying differential gene connectivity between disease states for objective identification of disease-relevant genes. BMC Syst Biol. 2011 May 31; 2011 May 31. PMID: 21627793
- Hunninghake GM, Chu JH, Sharma SS, Cho MH, Himes BE, Rogers AJ, Murphy A, Carey VJ, Raby BA: The CD4+ T-cell transcriptome and serum IgE in asthma: IL17RB and the role of sex. BMC Pulm Med. 2011 Apr 7; 2011 Apr 7. PMID: 21473777
- Sharma S, Murphy A, Howrylak J, Himes B, Cho MH, Chu JH, Hunninghake GM, Fuhlbrigge A, Klanderman B, Ziniti J, Senter-Sylvia J, Liu A, Szefler SJ, Strunk R, Castro M, Hansel NN, Diette GB, Vonakis BM, Adkinson NF Jr, Carey VJ, Raby BA: The impact of self-identified race on epidemiologic studies of gene expression. Genet Epidemiol. 2011 Feb. PMID: 21254216
- Carpe N, Mandeville I, Ribeiro L, Ponton A, Martin JG, Kho AT, Chu JH, Tantisira K, Weiss ST, Raby BA, Kaplan F: Genetic influences on asthma susceptibility in the developing lung. Am J Respir Cell Mol Biol. 2010 Dec; 2010 Jan 29. PMID: 20118217
- Murphy A, Chu JH, Xu M, Carey VJ, Lazarus R, Liu A, Szefler SJ, Strunk R, Demuth K, Castro M, Hansel NN, Diette GB, Vonakis BM, Adkinson NF Jr, Klanderman BJ, Senter-Sylvia J, Ziniti J, Lange C, Pastinen T, Raby BA: Mapping of numerous disease-associated expression polymorphisms in primary peripheral blood CD4+ lymphocytes. Hum Mol Genet. 2010 Dec 1; 2010 Sep 10. PMID: 20833654
- Murphy A, Won S, Rogers A, Chu JH, Raby BA, Lange C: On the genome-wide analysis of copy number variants in family-based designs: methods for combining family-based and population-based information for testing dichotomous or quantitative traits, or completely ascertained samples. Genet Epidemiol. 2010 Sep. PMID: 20718041
- Chu JH, Weiss ST, Carey VJ, Raby BA: A graphical model approach for inferring large-scale networks integrating gene expression and genetic polymorphism. BMC Syst Biol. 2009 May 27; 2009 May 27. PMID: 19473523