Computational Methods & Tools Developed in the Kleinstein Lab
The Immcantation framework provides a start-to-finish analytical ecosystem for high-throughput adaptive immune receptor repertoire sequencing (AIRR-seq) datasets, with a focus on B cell receptor (BCR) repertoire profiling. Beginning from raw reads, Python and R packages are provided for pre-processing, population structure determination, and repertoire analysis. An overview of AIRR-seq analysis can be found in (Yaari and Kleinstein, 2015).
The Immcantation framework includes:
- pRESTO (REpertoire Sequencing TOolkit), a state-of-the-art software toolkit for pre-processing high throughput BCR and TCR sequencing data. Described in (Vander Heiden, Yaari et al., 2014).
- Change-O, a suite of computational methods for advanced analysis of BCR sequencing data, including identification of clonally-related sequences. Described in (Gupta, Vander Heiden et al., 2015) and (Gupta et al., 2017).
- TIgGER (Tool for Ig Genotype Elucidation via airR-seq), a set of methods for identifying novel V gene alleles and constructing subject-specific genotypes. Described in (Gadala-Maria et al., 2015).
- aLAkazam, a set of methods for B cell clonal lineage tree construction and diversity analysis.
- SHazaM, a framework for advanced statistical analysis of somatic hypermutation (SHM) patterns. This includes selection analysis with BASELINe (Bayesian estimation of Antigen-driven SELectIoN), described in (Yaari et al., 2012), as well as the S5F modeling framework for SHM targeting and nucleotide substitution, described in (Yaari et al., 2013).
- SCOPer, a computational framework for the identification of B cell clonal relationships from Adaptive Immune Receptor Repertoire sequencing (AIRR-Seq) data. It includes methods for assigning clonal identifiers using sequence identity, hierarchical clustering, and spectral clustering. Described in (Nouri and Kleinstein, 2018) and (Nouri and Kleinstein, 2020).
- Dowser, a set of tools for performing phylogenetic analysis on B cell receptor repertoires. It supports building and visualizing trees using multiple methods, and implements statistical methods for inferring B cell migration, differentiation, and isotype switching networks. Described in (Hoehn et al., 2022).
LogMiNeR (Logistic Multiple Network-constrained Regression) is a method for analyzing high-throughput transcriptional profiling data (e.g., microarray or RNA-seq) in which multiple networks encoding prior knowledge are incorporated within a logistic modeling framework to improve model interpretability. A complete description of the method is available in (Avey et al., 2017).
This R/Bioconductor package implements the Quantitative Set Analysis for Gene Expression (QuSAGE) method described in (Yaari et al., Nucleic Acids Res, 2013). QuSAGE is a substitute for existing gene set methods, such as GSEA, and provides a faster, more accurate, and easier to understand test for gene expression studies. QuSAGE accounts for inter-gene correlations and quantifies gene set activity with a complete probability density function (PDF). From this PDF, P values and confidence intervals can be easily extracted. Preserving the PDF also allows for post-hoc analysis (e.g., pair-wise comparisons of gene set activity) while maintaining statistical traceability.
Cell subset prediction for blood genomic studies (SPEC) is a computational method to predict the cellular source for a pre-defined list of genes (i.e., a gene signature) using gene expression data from total PBMCs. Details of the method are described in (Bolen et al., BMC Bioinformatics, 2011).
The TIme-Dependent Activity Linker (TIDAL) generates a transcription factor regulatory network from time-series gene expression data. It will identify transcription factors that are active at each time-point in your data, and link these factors in a coherent cascade which can be visualized. Details of the method are described in (Zaslavsky et al., BMC Bioinformatics, 2013) and (Zaslavsky et al., Journal of Immunology, 2010).