Computational Methods & Tools Developed in the Kleinstein Lab

Immcantation framework

The Immcantation framework provides a start-to-finish analytical ecosystem for high-throughput adaptive immune receptor repertoire sequencing (AIRR-seq) datasets, with a focus on B cell receptor (BCR) repertoire profiling. Beginning from raw reads, Python and R packages are provided for pre-processing, population structure determination, and repertoire analysis. An overview of AIRR-seq analysis can be found in (Yaari and Kleinstein, 2015).

nf-core/airrflow, a best-practice pipeline to analyze adaptive immune repertoire sequencing data from start to finish using the immcantation framework tools, supports analysis of bulk and single-cell targeted AIRRseq/VDJ libraries departing from either raw reads or assembled sequences. It also supports the extraction of BCR and TCR sequences from untargeted RNAseq and single-cell RNAseq data. Described in (Gabernet, Marquez et al. 2024).

The Immcantation framework includes:

pRESTO (REpertoire Sequencing TOolkit), a state-of-the-art software toolkit for pre-processing high throughput BCR and TCR sequencing data. Described in (Vander Heiden, Yaari et al., 2014).
Change-O, a suite of computational methods for advanced analysis of BCR sequencing data, including identification of clonally-related sequences. Described in (Gupta, Vander Heiden et al., 2015) and (Gupta et al., 2017).
TIgGER (Tool for Ig Genotype Elucidation via airR-seq), a set of methods for identifying novel V gene alleles and constructing subject-specific genotypes. Described in (Gadala-Maria et al., 2015).
aLAkazam, a set of methods for B cell clonal lineage tree construction and diversity analysis.
SHazaM, a framework for advanced statistical analysis of somatic hypermutation (SHM) patterns. This includes selection analysis with BASELINe (Bayesian estimation of Antigen-driven SELectIoN), described in (Yaari et al., 2012), as well as the S5F modeling framework for SHM targeting and nucleotide substitution, described in (Yaari et al., 2013).
SCOPer, a computational framework for the identification of B cell clonal relationships from Adaptive Immune Receptor Repertoire sequencing (AIRR-Seq) data. It includes methods for assigning clonal identifiers using sequence identity, hierarchical clustering, and spectral clustering. Described in (Nouri and Kleinstein, 2018) and (Nouri and Kleinstein, 2020).
Dowser, a set of tools for performing phylogenetic analysis on B cell receptor repertoires. It supports building and visualizing trees using multiple methods, and implements statistical methods for inferring B cell migration, differentiation, and isotype switching networks. Described in (Hoehn et al., 2022).

LogMiNer

LogMiNeR (Logistic Multiple Network-constrained Regression) is a method for analyzing high-throughput transcriptional profiling data (e.g., microarray or RNA-seq) in which multiple networks encoding prior knowledge are incorporated within a logistic modeling framework to improve model interpretability. A complete description of the method is available in (Avey et al., 2017).

SPEAR

SPEAR (Signature-based multiPle-omics intEgration via lAtent factoRs) is a supervised variational Bayesian factor model that effectively integrates multi-omics data, reduces dimensionality into latent factors, and identifies predictive signatures of disease outcomes. This method improves both the reconstruction of underlying factors and prediction accuracy when modeling paired multi-omics assays with a response of interest. Details of the method are described in (Gygi et al., 2024).

nipalsMCIA

nipalsMCIA is an R/Bioconductor package that uses Nonlinear Iterative Partial Least Squares (NIPALS) to perform joint dimensionality reduction on multi-omic data using Multiple Co-Inertia Analysis (MCIA). The iterative approach allows for fast low-dimensional embedding and visualization of high-dimensional multi-omic datasets, such as those arising from single-cell studies. Details of the method are described in (Mattessich et al., 2025)

QuSAGE

This R/Bioconductor package implements the Quantitative Set Analysis for Gene Expression (QuSAGE) method described in (Yaari et al., Nucleic Acids Res, 2013). QuSAGE is a substitute for existing gene set methods, such as GSEA, and provides a faster, more accurate, and easier to understand test for gene expression studies. QuSAGE accounts for inter-gene correlations and quantifies gene set activity with a complete probability density function (PDF). From this PDF, P values and confidence intervals can be easily extracted. Preserving the PDF also allows for post-hoc analysis (e.g., pair-wise comparisons of gene set activity) while maintaining statistical traceability.

SPEC

Cell subset prediction for blood genomic studies (SPEC) is a computational method to predict the cellular source for a pre-defined list of genes (i.e., a gene signature) using gene expression data from total PBMCs. Details of the method are described in (Bolen et al., BMC Bioinformatics, 2011).

TiDAL

The TIme-Dependent Activity Linker (TIDAL) generates a transcription factor regulatory network from time-series gene expression data. It will identify transcription factors that are active at each time-point in your data, and link these factors in a coherent cascade which can be visualized. Details of the method are described in (Zaslavsky et al., BMC Bioinformatics, 2013) and (Zaslavsky et al., Journal of Immunology, 2010).