The Immcantation framework provides a start-to-finish analytical ecosystem for high-throughput adaptive immune receptor repertoire sequencing (AIRR-seq) datasets, with a focus on B cell receptor (BCR) repertoire profiling. Beginning from raw reads, Python and R packages are provided for pre-processing, population structure determination, and repertoire analysis. An overview of AIRR-seq analysis can be found in (Yaari and Kleinstein, 2015).
The Immcantation framework includes:
- pRESTO (REpertoire Sequencing TOolkit), a state-of-the-art software toolkit for pre-processing high throughput BCR and TCR sequencing data. Described in (Vander Heiden, Yaari et al., 2014).
- Change-O, a suite of computational methods for advanced analysis of BCR sequencing data, including identification of clonally-related sequences. Described in (Gupta, Vander Heiden et al., 2015) and (Gupta et al., 2017).
- TIgGER (Tool for Ig Genotype Elucidation via airR-seq), a set of methods for identifying novel V gene alleles and constructing subject-specific genotypes. Described in (Gadala-Maria et al., 2015).
- aLAkazam, a set of methods for B cell clonal lineage tree construction and diversity analysis.
SHazaM, a framework for advanced statistical analysis of somatic hypermutation (SHM) patterns. This includes selection analysis with BASELINe (Bayesian estimation of Antigen-driven SELectIoN), described in (Yaari et al., 2012), as well as the S5F modeling framework for SHM targeting and nucleotide substitution, described in (Yaari et al., 2013).
LogMiNeR (Logistic Multiple Network-constrained Regression) is a method for analyzing high-throughput transcriptional profiling data (e.g., microarray or RNA-seq) in which multiple networks encoding prior knowledge are incorporated within a logistic modeling framework to improve model interpretability. A complete description of the method is available in (Avey et al., 2017).
This R/Bioconductor package implements the Quantitative Set Analysis for Gene Expression (QuSAGE) method described in (Yaari et al., Nucleic Acids Res, 2013). QuSAGE is a substitute for existing gene set methods, such as GSEA, and provides a faster, more accurate, and easier to understand test for gene expression studies. QuSAGE accounts for inter-gene correlations and quantifies gene set activity with a complete probability density function (PDF). From this PDF, P values and confidence intervals can be easily extracted. Preserving the PDF also allows for post-hoc analysis (e.g., pair-wise comparisons of gene set activity) while maintaining statistical traceability.
Cell subset prediction for blood genomic studies (SPEC) is a computational method to predict the cellular source for a pre-defined list of genes (i.e., a gene signature) using gene expression data from total PBMCs. Details of the method are described in (Bolen et al., BMC Bioinformatics, 2011).
The TIme-Dependent Activity Linker (TIDAL) generates a transcription factor regulatory network from time-series gene expression data. It will identify transcription factors that are active at each time-point in your data, and link these factors in a coherent cascade which can be visualized. Details of the method are described in (Zaslavsky et al., BMC Bioinformatics, 2013) and (Zaslavsky et al., Journal of Immunology, 2010.