Yale Protein Expression Database (YPED)

We have developed an integrated web-accessible software system called the Yale Protein Expression Database or YPED [1], to address the storage, retrieval, and integrated analysis of high throughput proteomic and small molecule analyses. For proteomics data, YPED handles data from the following analyses: LC-MS/MS protein identifications and protein posttranslational modification (i.e. phosphorylation, ubiquitinylation, acetylation, methylation, etc.); identification/quantitation results from label-based proteomics experiments (such as DIGE, iTRAQ, ICAT, and SILAC experiments); LCMS-based label-free quantitative proteomics; and targeted proteomics (MRM).

Samples are grouped into projects. This helps researchers to keep track of different stages of their experiments (represented as multiple projects) and organize the results generated from each experimental stage into a project. A publication can be associated with one or multiple projects. Sample descriptions are compatible with the evolving MIAPE standards [2]. Researchers can view, subset and download their data through a secure Web interface. YPED also features modules for sample submission and tracking, and user billing.

Data associated with a published paper can be released to a publicly accessible repository called the YPED repository. Private (anonymous) access by reviewers to data associated with manuscripts under review can be given by the YPED repository.

The data is presented in tables where summary facts can be clicked to access supporting data all the way to down to the MS/MS data (for DIGE samples, the gel images are clickable). Tools are available to facilitate sample comparison and to assess the distribution of biological functions (through a remote query to PANTHER [3]) among the identified proteins in a sample. Phosphoprotein data can be drilled down to view the probability that a peptide is a phosphopeptide and that a phosphorylation site is actually phosphorylated. An entire targeted proteomics pipeline has been integrated into YPED which enables utilization of our custom peptide spectral library database to facilitate peptide and MRM transition selection for global targeted proteomic analysis, tools for MRM method export, and an interface for collation of quantitation data results and review.

YPED also serves as a peptide spectral library for all our protein database search identification results. As of March 1, 2012, YPED contained 12,113 datasets from 1086 users, resulting in a current database of 441,950 unique LC-MS proteins and 2,304,375 distinct mascot LCMS peptides.