Computational Methods & Tools Developed in the Kleinstein Lab

Computational Methods & Tools Developed in the Kleinstein Lab


pRESTO (REpertoire Sequencing TOolkit)

 is an integrated collection of platform-independent Python modules for processing raw reads from high-throughput (next-generation) sequencing of lymphocyte repertoires. pRESTO processes raw sequences to produce error-corrected, sorted and annotated sequence sets, along with a wealth of metrics at each step.  

Example workflows

 for Roche 454 and Illumina (MiSeq) platforms are available.

This website provides tools to detect and quantify selection from mutated B cell immunoglobulin (Ig) sequences. It implements a statistical framework for Bayesian estimation of Antigen-driven SELectIoN (BASELINe) based on the analysis of somatic mutation patterns. A complete description of the method is available in (Yaari et al., Nucleic Acids Res, 2012). Our previous method, the Focused Z test, developed in (Uduman et al., 2011) and (Hershberg et al., 2008), is available here.

This website provides models of somatic hypermutation (SHM) targeting and nucleotide substitution constructed from high-throughput B cell immunoglobulin (Ig) sequencing data. Source code to construct and visualize these models is also available. The S5F model is constructed using Synonymous mutations in 5-mer motifs of Functional Ig sequences. Version 07312013.1 is based on >800,000 mutations.

This R/Bioconductor package implements the Quantitative Set Analysis for Gene Expression (QuSAGE) method described in (Yaari et al., Nucleic Acids Res, 2013). QuSAGE is a substitute for existing gene set methods, such as GSEA, and provides a faster, more accurate, and easier to understand test for gene expression studies. QuSAGE accounts for inter-gene correlations and quantifies gene set activity with a complete probability density function (PDF). From this PDF, P values and confidence intervals can be easily extracted. Preserving the PDF also allows for post-hoc analysis (e.g., pair-wise comparisons of gene set activity) while maintaining statistical traceability.

Cell subset prediction for blood genomic studies (SPEC) is a computational method to predict the cellular source for a pre-defined list of genes (i.e., a gene signature) using gene expression data from total PBMCs. Details of the method are described in (Bolen et al., BMC Bioinformatics, 2011).

The TIme-Dependent Activity Linker (TIDAL) generates a transcription factor regulatory network from time-series gene expression data. It will identify transcription factors that are active at each time-point in your data, and link these factors in a coherent cascade which can be visualized. Details of the method are described in (Zaslavsky et al., BMC Bioinformatics, 2013) and (Zaslavsky et al., Journal of Immunology, 2010.