activity The Journal of Allergy and Clinical Immunology
2016 - Presentactivity BMC Bioinformatics
2009 - Presentactivity Bioinformatics
2009 - Presentactivity Genomics, Computational Biology and Technology Study Section
06/06/2024 - 06/07/2024activity A Hybrid Machine Learning and regression Method for Cell Type Deconvolution of Spatial Barcoding-based Transcriptomic data
activity Spatial Deconvolution Method Considering Platform Effect Removal, Sparsity and Spatial Information
activity A Hybrid Machine Learning and regression Method for Cell Type Deconvolution of Spatial Barcoding-based Transcriptomic data
activity Statistical Challenges in Analyses of Single Cell RNA Sequencing Data and Spatial Transcriptomic Data
activity Idiopathic Pulmonary Fibrosis at Single-Cell Resol
activity Statistical Modeling in Single Cell Genomics Data
activity Statistical challenges in single cell RNA sequencing data analysis
activity Cell Lineage Defined By Single Cell Transcriptomics Of The Sputum In Asthma
activity Pathway based clustering of longitudinal RNAseq of induced sputum in asthma patients reveals stable transcriptional endotypes of asthma
activity Longitudinal RNA Sequencing Data of Induced Sputum in Asthma Patients Reveals Stable Transcriptional Endotypes of Asthma Associated with Asthma Severity
activity Longitudinal RNA sequencing data of induced sputum in asthma patients reveals stable transcriptional endotypes of asthma associated with asthma severity
activity Noninvasive Analysis of the Sputum Transcriptome Discriminates Clinical Phenotypes of Asthma
activity Longitudinal RNA sequencing data of induced sputum in asthma patients reveals stable transcriptional endotypes of asthma associated with asthma severity
activity Gene Expression Analysis Pipeline for Ion Torrent RNAseq Data
Abstract/SynopsisRATIONALE: Next generation sequencing (NGS) technology has been popularly used to measure gene expression profiles. Among the popular NGS technologies, Ion Torrent Proton (Life Technologies) has been shown to have comparable sequencing error rate and has been used in several studies for microbiome sequencing, DNA sequencing and RNA sequencing. While most NGS computational analysis pipelines were developed for the Illumina technology, studies on optimizing the analysis pipelines for Ion Torrent Proton mRNA sequencing data are under developed. To address this, we evaluate the performances of different computational methods on processing RNAseq data from patients with Sarcoidosis, Alpha-1 Antitrypsin Deficiency and idiopathic pulmonary fibrosis (IPF) to comprise an optimized analysis pipeline for the Ion Torrent Proton RNAseq data. METHODS: The analysis pipeline is comprised of data quality assessment, reads alignment, data visualization and gene expression level estimation and comparison. For each step, we apply different computational methods to the data and compare their results. To optimize the reads alignment method, we apply tophat2, TMAP and a two stage mapping strategy to the data and compare their mapping rate and the coverage biases within genes. For gene expression level estimation and comparison, cufflinks and DESeq are applied and compared based on their correlation with gene expression levels in the same samples using Agilent array. The method that has the best performance is employed in the analysis pipeline. RESULTS: Based on the Ion Torrent Proton RNAseq data measured in this analysis, the two stage mapping method showed an average base level mapping rate of 90.25%, which is much higher than tophat2 and TMAP (p value<2.2e-9), while the coverage bias by the two methods are comparable (average p value=0.1). Gene expression levels estimated by cufflinks showed an average correlation of 0.71 (p value=2.2e-16) with the gene expression levels measured by Agilent microarrays in IPF patients. We are currently working on applying DESeq to conduct the comparison to Cuffdiff and Cufflinks. We expect to obtain the comparison results within the next three months. CONCLUSIONS: We have developed an optimized analysis pipeline for the Ion Torrent Proton RNA sequencing data with PI chip. Results by the pipeline showed high correlation with expression analysis using microarrays.
activity RNAseq in Sarcoidosis and Alpha-1 Antitrypsin Deficiency Patients
Abstract/SynopsisRATIONALE: Alpha-1 Antitrypsin Deficiency (A1AT) and Sarcoidosis (SARC) are two under-recognized chronic lung disease. The Genomic Research in Alpha-1 Antitrypsin Deficiency and Sarcoidosis (GRADS) are collecting samples from both A1AT and SARC patients for parallel microbiome and transcriptome studies to identify biomarkers that indicate the current status of the lung diseases and predict their progression and response to the therapy. To understand the transcriptomic features of the diseases, we have analyzed the RNA sequencing data from the GRADS cohort and idiopathic pulmonary fibrosis (IPF) patients to understand the data quality and identify gene signatures that distinguish the two diseases and IPF. METHODS: Samples were purified for mRNAs and further sequenced using the Ion Torrent Proton with PI Chip (Life Technology). Reads were mapped to Human Genome using a two stage mapping strategy suggested by Ion Torrent. The Fragments Per Kilobase of transcript per Million mapped reads (FPKMs) was estimated using Cufflinks. Principal component analysis (PCA) was applied to visualize the data and identify outlying samples. The IPF samples were previously profiled for gene expression using Agilent microarrays and the correlation between the RNA sequencing data and microarray data was assessed to better understand the accuracy of the sequencing technology. Finally, differentially expressed genes (DEGs) between the two diseases were identified using Cuffdiff. Pathways enriched for these DEGs were identified using GeneGO MetaCore. RESULTS: The sequencing reads were shown to have high quality based on the quality assessment report by FastQC. Average base level mapping rate across the samples is 90.25%. Based on the IPF samples, the FPKMs estimated from the RNA sequencing data is highly correlated with the expression levels measured by the Agilent microarrays (Pearson Correlation Coefficient=0.71, p value=2.2e-16). Data visualization using PCA showed global transcriptomic differences between the three diseases. Taken together, the sequencing data has a high quality and the data does provide information on the differences between the three diseases. The FPKMs were compared across the three diseases to identify the differentially expressed genes. These genes were further enriched for biological pathway to better understand the pathobiology of the diseases. CONCLUSIONS: Ion Torrent Proton provided high quality RNAseq data for our SARC, A1AT and IPF samples. Differentially expressed genes are identified between the three diseases and enriched for biologic pathways.
activity Non-invasive Analysis of the Airway Transcriptome Discriminates Clinical Phenotypes of Asthma
Abstract/SynopsisRATIONALE: It is increasingly evident that pathobiologic alterations in asthma are heterogeneous and the airway transcriptome has the potential to identify this heterogeneity, ultimately defining “transcriptional endotypes of asthma” (TEA clusters). To this end, we conducted an unsupervised KEGG pathway based clustering analysis of gene expression in the induced sputum of adults and children with asthma. The identified TEA clusters were correlated to demographical, physiological and inflammatory phenotype of the disease. Differentially expressed genes between each TEA cluster and controls were identified and analyzed for functional enrichment analysis to better understand the pathobiology of each TEA cluster. METHODS: Whole transcriptome wide gene expression profile in sputum and circulation of asthma patients was measured using the Affymetrix HuGene 1.0ST arrays. Unsupervised clustering analysis based on pathways from the Kyoto Encyclopedia of Genes and Genomes (KEGG) was used to identify TEA clusters from the sputum gene expression profiles. The identified TEA clusters were correlated with clinical, physiologic, and inflammatory characteristics of the disease. Lastly, logistic regression analysis of expression profiles in matched blood samples defined an expression profile in the circulation to determine the TEA cluster assignment in a cohort of children with asthma for validation. RESULTS: Three TEA clusters were identified. TEA cluster 1 had the most subjects with a history of intubation (P=0.05), a lower pre-bronchodilator FEV1 (P=0.006), a higher bronchodilator response (P=0.03), and higher exhaled nitric oxide levels (P=0.04), compared to the other TEA clusters. TEA cluster 2, the smallest cluster had the most subjects that were hospitalized for asthma (P=0.04). TEA cluster 3, the largest cluster, had normal lung function, low exhaled nitric oxide levels, and lower inhaled steroid requirements. Evaluation of TEA clusters in children from Asthma BRIDGE cohort confirmed that TEA clusters 1 and 2 are associated with a history of intubation (P=5.58x10-06) and hospitalization (P=0.01), respectively. Evaluation of the TH2 gene signatures suggested a much lower prevalence of TH2 high defined disease than previously reported and a weak overlap between the identified TEA clusters and TH2 high/low diagram. This indicates that the TEA clusters are driven by biologic phenomena that are “upstream” or parallel to Th2 inflammation. CONCLUSIONS: Non-invasive analysis of the sputum transcriptome conducted in this study identified three TEA clusters with different clinical and physiologic characteristics of disease. Two TEA clusters are associated with phenotypes of severe diseases: a history of a near fatal asthma and a history of hospitalization for asthma.