Research Departments & Organizations
My current research focus on developing novel statistical and computational models to analyze large scale genetic and genomic data from patients with chronic lung diseases including asthma, idiopathic pulmonary fibrosis (IPF), sarcoidosis and pediatric cystic fibrosis.
In the study on asthma collaborated with Dr. Geoffrey Chupp, we identified three subtypes of asthma or TEA clusters using gene expression data from the induced sputum and blood: those with high risk of having near-fatal asthma attacks, those with severe symptoms of asthma, and those with milder asthma. In addition, by analyzing the gene expression in the blood, we could design blood test to identify the asthma subtypes of patient to optimize the choice of treatment or drugs. Ultimately, this could lead to personalized treatment for asthma patients. A novel pathway-based clustering method was developed to achieve these results which has been compared to traditional pathway-based clustering methods to show better robustness and accuracy using both simulated data and real datasets. Currently, longitudinal induced sputum and whole blood samples are being collected from patients, which are prepared for RNA sequencing. To analyze these data, we are developing novel statistical and computational approaches to identify genetic information from the longitudinal RNA sequencing data and integrate it with the transcriptional profiles from the same data set to identify time invariant molecular endotypes of asthma.
In the study on IPF and Sarcoidosis collaborated with Dr. Naftali Kaminski, we are trying to understand the genomics and genetics of the patients. The second generation sequencing technology was used to measure both the gene expression levels and the sequence mutations in the patients. My computational team is currently working on preprocessing and analyzing these sequencing data to better understand the disease heterogeneity and pathogenesis using network analysis approaches, data integration analysis and longitudinal data analysis.
In the study on pediatric cystic fibrosis, patients provide weekly surveys and clinical visits to provide sputum and stool samples. These samples were sequenced to understand what bacteria exist, how they change over time and whether they behave differently between children with and without cystic fibrosis. My computational team is currently working on developing statistical and computational approach to analyze the longitudinal 16s rRNA sequencing data.
Extensive Research Description
Analysis of longitudinal RNA sequencing data from asthma patients;
Analysis of longitudinal gene expression data from asthma patients under bronchial thermoplasty procedures;
Analysis of longitudinal microbiome sequencing data from children with cystic fibrosis;
RNA sequencing of IPF, A1AT and SARC patients using Ion Torrent technology;
Single cell RNA sequencing data analysis;
ATS 2017 Washington, United States (2017 - 2017)
Keystone Symposia: Asthma: From Pathway Biology to Precision Therapeutics Keystone, United States (2017 - 2017)
Gene Expression Analysis Pipeline for Ion Torrent RNAseq Data Denver, United States (2015 - 2015)
RNAseq in Sarcoidosis and Alpha-1 Antitrypsin Deficiency Patients Denver, United States (2015 - 2015)
Non-invasive Analysis of the Airway Transcriptome Discriminates Clinical Phenotypes of Asthma Denver, United States (2015 - 2015)
Non-invasive Analysis of the Sputum Transcriptome Discriminates Clinical Phenotypes of Asthma Washington, United States (2015 - 2015)
Identifying Asthma Heterogeneity from Gene Expression Data by Integrating Pathway Information Research Triangle Park, United States (2015 - 2015)
A novel pathway-based distance score enhances assessment of disease heterogeneity in gene expression.
Yan X, Liang A, Gomez J, Cohn L, Zhao H, Chupp GL. A novel pathway-based distance score enhances assessment of disease heterogeneity in gene expression. BMC Bioinformatics 2017, 18:309. 2017
Noninvasive analysis of the sputum transcriptome discriminates clinical phenotypes of asthma.
Yan X, Chu JH, Gomez J, Koenigs M, Holm C, He X, Perez MF, Zhao H, Mane S, Martinez FD, Ober C, Nicolae DL, Barnes KC, London SJ, Gilliland F, Weiss ST, Raby BA, Cohn L, Chupp GL. Noninvasive analysis of the sputum transcriptome discriminates clinical phenotypes of asthma. American Journal Of Respiratory And Critical Care Medicine 2015, 191:1116-25. 2015
Modeling RNA degradation for RNA-Seq with applications.
Wan L, Yan X, Chen T, Sun F. Modeling RNA degradation for RNA-Seq with applications. Biostatistics (Oxford, England) 2012, 13:734-47. 2012
Detecting functional rare variants by collapsing and incorporating functional annotation in Genetic Analysis Workshop 17 mini-exome data.
Yan X, Li L, Lee JS, Zheng W, Ferguson J, Zhao H. Detecting functional rare variants by collapsing and incorporating functional annotation in Genetic Analysis Workshop 17 mini-exome data. BMC Proceedings 2011, 5 Suppl 9:S27. 2011
Dealing with high dimensionality for the identification of common and rare variants as main effects and for gene-environment interaction.
Bickeböller H, Houwing-Duistermaat JJ, Wang X, Yan X. Dealing with high dimensionality for the identification of common and rare variants as main effects and for gene-environment interaction. Genetic Epidemiology 2011, 35 Suppl 1:S35-40. 2011
Testing gene set enrichment for subset of genes: Sub-GSE.
Yan X, Sun F. Testing gene set enrichment for subset of genes: Sub-GSE. BMC Bioinformatics 2008, 9:362. 2008
Detecting differentially expressed genes by relative entropy.
Yan X, Deng M, Fung WK, Qian M. Detecting differentially expressed genes by relative entropy. Journal Of Theoretical Biology 2005, 234:395-402. 2005