Research & Publications
My main research interests, for both methodological and applied works, focus on two types of problems: (1) to use network/systems biology based methods to solve biological problems (2) to integrate different data types/sources in statistical genetics/systems biology. Most of my works address a combination of these two types of problems. In recent years network models of genomic data have been frequently applied for characterization of complex interactions among genes and phenotypes. Specifically, the regulatory systems of a cell have been effectively described by networks, providing better understanding of the molecular processes underlying cellular function, and ultimately improving our understanding of disease pathogenesis. One aspect of my contribution in this field is to have developed novel methodologies based on Gaussian graphical model (GGM) and applied the methods on multiple data types.
Extensive Research Description
SNP-gene network analysis: I developed a multistep approach to infer a gene-SNP network from gene expression and genotyped SNP data. The approach consists of 4 steps: (1) construction of a graphical Gaussian model (GGM) based on small sample estimation of partial correlation and false-discovery rate multiple testing; (2) extraction of a subnetwork of genes directly linked to a target candidate gene of interest; (3) identification of cis-acting regulatory variants for the genes composing the sub-network; and (4) evaluating the identified cis-acting variants for trans-acting regulatory effects of the target candidate gene. In an application of this method, we focused on two biologic candidate genes in asthma pathogenesis, Interleukin 12 receptor, beta 2 (IL12RB2) and Interleukin 1B (IL1B), and built complex gene-SNP networks around them using the genotyped variants and gene expression data in a childhood asthma cohort. After FDR adjustment we identified 225 SNP-gene pairs with significant association working through IL12RB2 (suggesting trans-eQTL), and 353 SNP-gene pairs for IL1B. We were also able to a significant part of the network from two other independent data sets, demonstrating the reproducibility of our network building.
Quantifying differential network connectivity:I developed a novel approach for inferring the associations in gene-gene interaction (epistasis) networks across disease states based on GGM. We compared the posterior probabilities of connectivity for each gene pair across two disease states, expressed as a posterior odds-ratio (postOR) for each pair, which can be used to identify network components most relevant to disease status. This method represents one of the few that can objectively quantify the differences in coexpression between two states (i.e. cases vs. controls, treated vs. untreated, etc) within a formalized statistical framework. We applied the method on two independent gene expression data sets from breast cancer tissues of varying histological grade. We compared the network connectivity patterns observed across breast cancers of different histological grades in these two data sets, and found significant overlap across the studies. A significant number of hub genes that we identified had also been previously linked to breast cancer, suggesting that differential connectivity mapping is exquisitely specific in the identification of biologically relevant genes.
Phenotypic Networks: We also applied the GGM to analyze the relationships among multiple disease-related phenotypes. We applied this method to two large, well-characterized studies of chronic obstructive pulmonary disease (COPD). We also examined the associations between these COPD phenotypic networks and other factors, including case-control status, disease severity, and genetic variants. Using these phenotypic networks, we have detected novel relationships between phenotypes that would not have been observed using traditional epidemiological approaches. For example, higher emphysema was associated with higher BMI in the control group but was associated with lower BMI in the case group, and both associations were statistically significant.Since severe emphysema can lead to cachexia, the association of higher emphysema with lower BMI among COPD cases is consistent with clinical experience. Therefore, we believe that phenotypic network analysis of complex diseases could provide novel insights into disease susceptibility, disease severity, and genetic mechanisms.
Other collaborative work: In additional to my methodological work, I have also been involved in genetic data analysis for many projects, serving as co-investigator in several NIH-funded grants, including eQTL analysis for the Childhood Asthma Management Program (CAMP), lung development, pharmacogenetics, and copy number variant analysis. For each project I actively look to bring fresh ideas and new research directions to the analysis, incorporating novel data-mining methods that had not been routinely used in genetic data analysis, including LASSO regression, Random Forest, and Bayesian mixture models.
Future Work: The methods that I developed are very versatile and have the potential to be applied on a variety of different biological problems and data types. For example, my methods could be extended to incorporate next generation sequencing (NGS) data, or other genomic elements such as micro RNA, which will become more commonly available in the future. There are also other data types, such as copy number variations and methylation, that could also be incorporated in an integrated model of network complexity. I am also interested in extending my methodological work on different type of biological problems, including the identification of disease-relevant regulatory network, disease subtyping, and network pharmacology.
Data Interpretation, Statistical; Lung Diseases; Models, Statistical; Biostatistics; Data Mining
Public Health Interests
Genetics, Genomics, Epigenetics