Computational Methods & Tools Developed in the Kleinstein Lab


pRESTO (REpertoire Sequencing TOolkit) is an integrated collection of platform-independent Python modules for processing raw reads from high-throughput (next-generation) sequencing of lymphocyte repertoires. pRESTO processes raw sequences to produce error-corrected, sorted and annotated sequence sets, along with a wealth of metrics at each step.


Change-O is a suite of utilities for advanced analysis of large-scale B cell Ig sequencing data sets. Change-O includes tools for determining the complete set of Ig variable region gene segment alleles carried by an individual (including novel alleles), partitioning of Ig sequences into clonal populations, creating lineage trees, inferring somatic hypermutation targeting models, measuring repertoire diversity, quantifying selection pressure, and calculating sequence chemical properties.

This website provides tools to detect and quantify selection from mutated B cell immunoglobulin (Ig) sequences. It implements a statistical framework for Bayesian estimation of Antigen-driven SELectIoN (BASELINe) based on the analysis of somatic mutation patterns. A complete description of the method is available in (Yaari et al., Nucleic Acids Res, 2012). Our previous method, the Focused Z test, developed in (Uduman et al., 2011) and (Hershberg et al., 2008), is available here.

This website provides models of somatic hypermutation (SHM) targeting and nucleotide substitution constructed from high-throughput B cell immunoglobulin (Ig) sequencing data. Source code to construct and visualize these models is also available. The S5F model is constructed using Synonymous mutations in 5-mer motifs of Functional Ig sequences. Version 07312013.1 is based on >800,000 mutations.

This R/Bioconductor package implements the Quantitative Set Analysis for Gene Expression (QuSAGE) method described in (Yaari et al., Nucleic Acids Res, 2013). QuSAGE is a substitute for existing gene set methods, such as GSEA, and provides a faster, more accurate, and easier to understand test for gene expression studies. QuSAGE accounts for inter-gene correlations and quantifies gene set activity with a complete probability density function (PDF). From this PDF, P values and confidence intervals can be easily extracted. Preserving the PDF also allows for post-hoc analysis (e.g., pair-wise comparisons of gene set activity) while maintaining statistical traceability.

Cell subset prediction for blood genomic studies (SPEC) is a computational method to predict the cellular source for a pre-defined list of genes (i.e., a gene signature) using gene expression data from total PBMCs. Details of the method are described in (Bolen et al., BMC Bioinformatics, 2011).

The TIme-Dependent Activity Linker (TIDAL) generates a transcription factor regulatory network from time-series gene expression data. It will identify transcription factors that are active at each time-point in your data, and link these factors in a coherent cascade which can be visualized. Details of the method are described in (Zaslavsky et al., BMC Bioinformatics, 2013) and (Zaslavsky et al., Journal of Immunology, 2010.