Research Departments & Organizations
My main research interests, for both methodological and applied works, focus on two types of problems: (1) to use network/systems biology based methods to solve biological problems (2) to integrate different data types/sources in statistical genetics/systems biology. Most of my works address a combination of these two types of problems. In recent years network models of genomic data have been frequently applied for characterization of complex interactions among genes and phenotypes. Specifically, the regulatory systems of a cell have been effectively described by networks, providing better understanding of the molecular processes underlying cellular function, and ultimately improving our understanding of disease pathogenesis. One aspect of my contribution in this field is to have developed novel methodologies based on Gaussian graphical model (GGM) and applied the methods on multiple data types.
Extensive Research Description
SNP-gene network analysis: I developed a multistep approach to infer a gene-SNP network from gene expression and genotyped SNP data. The approach consists of 4 steps: (1) construction of a graphical Gaussian model (GGM) based on small sample estimation of partial correlation and false-discovery rate multiple testing; (2) extraction of a subnetwork of genes directly linked to a target candidate gene of interest; (3) identification of cis-acting regulatory variants for the genes composing the sub-network; and (4) evaluating the identified cis-acting variants for trans-acting regulatory effects of the target candidate gene. In an application of this method, we focused on two biologic candidate genes in asthma pathogenesis, Interleukin 12 receptor, beta 2 (IL12RB2) and Interleukin 1B (IL1B), and built complex gene-SNP networks around them using the genotyped variants and gene expression data in a childhood asthma cohort. After FDR adjustment we identified 225 SNP-gene pairs with significant association working through IL12RB2 (suggesting trans-eQTL), and 353 SNP-gene pairs for IL1B. We were also able to a significant part of the network from two other independent data sets, demonstrating the reproducibility of our network building.
Quantifying differential network connectivity:I developed a novel approach for inferring the associations in gene-gene interaction (epistasis) networks across disease states based on GGM. We compared the posterior probabilities of connectivity for each gene pair across two disease states, expressed as a posterior odds-ratio (postOR) for each pair, which can be used to identify network components most relevant to disease status. This method represents one of the few that can objectively quantify the differences in coexpression between two states (i.e. cases vs. controls, treated vs. untreated, etc) within a formalized statistical framework. We applied the method on two independent gene expression data sets from breast cancer tissues of varying histological grade. We compared the network connectivity patterns observed across breast cancers of different histological grades in these two data sets, and found significant overlap across the studies. A significant number of hub genes that we identified had also been previously linked to breast cancer, suggesting that differential connectivity mapping is exquisitely specific in the identification of biologically relevant genes.
Phenotypic Networks: We also applied the GGM to analyze the relationships among multiple disease-related phenotypes. We applied this method to two large, well-characterized studies of chronic obstructive pulmonary disease (COPD). We also examined the associations between these COPD phenotypic networks and other factors, including case-control status, disease severity, and genetic variants. Using these phenotypic networks, we have detected novel relationships between phenotypes that would not have been observed using traditional epidemiological approaches. For example, higher emphysema was associated with higher BMI in the control group but was associated with lower BMI in the case group, and both associations were statistically significant.Since severe emphysema can lead to cachexia, the association of higher emphysema with lower BMI among COPD cases is consistent with clinical experience. Therefore, we believe that phenotypic network analysis of complex diseases could provide novel insights into disease susceptibility, disease severity, and genetic mechanisms.
Other collaborative work: In additional to my methodological work, I have also been involved in genetic data analysis for many projects, serving as co-investigator in several NIH-funded grants, including eQTL analysis for the Childhood Asthma Management Program (CAMP), lung development, pharmacogenetics, and copy number variant analysis. For each project I actively look to bring fresh ideas and new research directions to the analysis, incorporating novel data-mining methods that had not been routinely used in genetic data analysis, including LASSO regression, Random Forest, and Bayesian mixture models.
Future Work: The methods that I developed are very versatile and have the potential to be applied on a variety of different biological problems and data types. For example, my methods could be extended to incorporate next generation sequencing (NGS) data, or other genomic elements such as micro RNA, which will become more commonly available in the future. There are also other data types, such as copy number variations and methylation, that could also be incorporated in an integrated model of network complexity. I am also interested in extending my methodological work on different type of biological problems, including the identification of disease-relevant regulatory network, disease subtyping, and network pharmacology.
Analyzing networks of phenotypes in complex diseases: methodology and applications in COPD.
Chu JH, Hersh CP, Castaldi PJ, Cho MH, Raby BA, Laird N, Bowler R, Rennard S, Loscalzo J, Quackenbush J, Silverman EK. Analyzing networks of phenotypes in complex diseases: methodology and applications in COPD. BMC Systems Biology 2014, 8:78. 2014
Generic Feature Selection with Short Fat Data.
Clarke B, Chu JH. Generic Feature Selection with Short Fat Data. Journal Of The Indian Society Of Agricultural Statistics. Indian Society Of Agricultural Statistics 2014, 68:145-162. 2014
Copy number variation genotyping using family information.
Chu JH, Rogers A, Ionita-Laza I, Darvishi K, Mills RE, Lee C, Raby BA. Copy number variation genotyping using family information. BMC Bioinformatics 2013, 14:157. 2013
Quantifying differential gene connectivity between disease states for objective identification of disease-relevant genes.
Chu JH, Lazarus R, Carey VJ, Raby BA. Quantifying differential gene connectivity between disease states for objective identification of disease-relevant genes. BMC Systems Biology 2011, 5:89. 2011
A graphical model approach for inferring large-scale networks integrating gene expression and genetic polymorphism.
Chu JH, Weiss ST, Carey VJ, Raby BA. A graphical model approach for inferring large-scale networks integrating gene expression and genetic polymorphism. BMC Systems Biology 2009, 3:55. 2009
Bayesian Function Estimation with Overcomplete Wavelet Dictionary.
Chu J, Clyde MA, Liang F. Statistica Sinica, 19, 1419-1438, 2009 2009
Noninvasive analysis of the sputum transcriptome discriminates clinical phenotypes of asthma.
Yan X, Chu JH, Gomez J, Koenigs M, Holm C, He X, Perez MF, Zhao H, Mane S, Martinez FD, Ober C, Nicolae DL, Barnes KC, London SJ, Gilliland F, Weiss ST, Raby BA, Cohn L, Chupp GL. Noninvasive analysis of the sputum transcriptome discriminates clinical phenotypes of asthma. American Journal Of Respiratory And Critical Care Medicine 2015, 191:1116-25. 2015
Circadian rhythm reprogramming during lung inflammation.
Haspel JA, Chettimada S, Shaik RS, Chu JH, Raby BA, Cernadas M, Carey V, Process V, Hunninghake GM, Ifedigbo E, Lederer JA, Englert J, Pelton A, Coronata A, Fredenburgh LE, Choi AM. Circadian rhythm reprogramming during lung inflammation. Nature Communications 2014, 5:4753. 2014
Copy number variation prevalence in known asthma genes and their impact on asthma susceptibility.
Rogers AJ, Chu JH, Darvishi K, Ionita-Laza I, Lehmann H, Mills R, Lee C, Raby BA. Copy number variation prevalence in known asthma genes and their impact on asthma susceptibility. Clinical And Experimental Allergy : Journal Of The British Society For Allergy And Clinical Immunology 2013, 43:455-62. 2013
The CD4+ T-cell transcriptome and serum IgE in asthma: IL17RB and the role of sex.
Hunninghake GM, Chu JH, Sharma SS, Cho MH, Himes BE, Rogers AJ, Murphy A, Carey VJ, Raby BA. The CD4+ T-cell transcriptome and serum IgE in asthma: IL17RB and the role of sex. BMC Pulmonary Medicine 2011, 11:17. 2011
Germline variants and advanced colorectal adenomas: adenoma prevention with celecoxib trial genome-wide association study.
Wang J, Carvajal-Carmona LG, Chu JH, Zauber AG, Kubo M, Matsuda K, Dunlop M, Houlston RS, Sieber O, Lipton L, Gibbs P, Martin NG, Montgomery GW, Young J, Baird PN, Ratain MJ, Nakamura Y, Weiss ST, Tomlinson I, Bertagnolli MM. Germline variants and advanced colorectal adenomas: adenoma prevention with celecoxib trial genome-wide association study. Clinical Cancer Research : An Official Journal Of The American Association For Cancer Research 2013, 19:6430-7. 2013
Genome Wide Association Study to predict severe asthma exacerbations in children using random forests classifiers.
Xu M, Tantisira KG, Wu A, Litonjua AA, Chu JH, Himes BE, Damask A, Weiss ST. Genome Wide Association Study to predict severe asthma exacerbations in children using random forests classifiers. BMC Medical Genetics 2011, 12:90. 2011
On the genome-wide analysis of copy number variants in family-based designs: methods for combining family-based and population-based information for testing dichotomous or quantitative traits, or completely ascertained samples.
Murphy A, Won S, Rogers A, Chu JH, Raby BA, Lange C. On the genome-wide analysis of copy number variants in family-based designs: methods for combining family-based and population-based information for testing dichotomous or quantitative traits, or completely ascertained samples. Genetic Epidemiology 2010, 34:582-90. 2010
Mapping of numerous disease-associated expression polymorphisms in primary peripheral blood CD4+ lymphocytes.
Murphy A, Chu JH, Xu M, Carey VJ, Lazarus R, Liu A, Szefler SJ, Strunk R, Demuth K, Castro M, Hansel NN, Diette GB, Vonakis BM, Adkinson NF, Klanderman BJ, Senter-Sylvia J, Ziniti J, Lange C, Pastinen T, Raby BA. Mapping of numerous disease-associated expression polymorphisms in primary peripheral blood CD4+ lymphocytes. Human Molecular Genetics 2010, 19:4745-57. 2010