The Yale Protein Expression Database (YPED) is an open-source proteomics suite and database designed to address the storage, retrieval, and integrated analysis of high throughput proteomic and small molecule analyses. As users’ needs evolve, we have significantly enhanced YPED to include new features that meet these needs. YPED provides a comprehensive workflow that extends from sample submission through a web user interface, which provides immediate access to newly-acquired data, to an integrated suite of biostatistics and bioinformatics tools for analyzing the resulting mass spectrometric proteomics data.
The initial version of YPED (1) provided requisition, submission, result reporting and sample comparison for multi-dimensional protein identification technology (MudPIT), difference gel electrophoresis (DIGE), and isotope-coded affinity tag (ICAT) labeled samples. In the current version (2), YPED now includes modules for LC–MS peptide and protein identifications (LC–MS/MS), multiplexed isobaric tagging technology (iTRAQ) and tandem mass tag (TMT), stable isotope labeling by amino acids in cell culture (SILAC), LC–MS/MS label-free quantitation (LFQ), and scoring for phosphopeptide localization. Using the discovery proteomic results, we have built a multiple reaction monitoring/selected reaction monitoring (MRM/SRM) targeted proteomics pipeline that includes an MS/MS spectral library. The peptide sequences in the spectral library have been compared via protein BLAST against Swiss-Prot and TrEMBL databases to determine if these sequences are unique to a specific protein and organism.
YPED consists of both a local database and a public repository that provides access to published and anonymous results. The wide range of data access privileges of YPED enables it to meet the needs of individual, multiple collaborative, and core laboratories. It supports multiple MS instruments and search engines. It also supports quantitation of labeled and label-free proteomics data. Sample/project annotations and search results stored in the database can be queried and viewed via a web user interface. We have also developed and integrated a suite of statistical analysis tools to enhance the quality and visualization of data. In addition, the YPED system is interoperable with a number of external resources to leverage proteomics databases and tools created by other groups.
As of October 4, 2015, YPED contained 20,216 datasets from 1,789 users in the laboratories of 755 principal investigators at more than 200 institutions around the world. These datasets contain liquid chromatography (LC)-MS/MS analyses from 4,461,056 distinct peptides derived from 1,133,725 distinct proteins. YPED’s spectral library contains spectra from 340,449 distinct human peptides. It contains ⩾2 distinct peptides from 19,327 human, 16,154 mouse, 7,661 rat, 6,007 yeast, and 4,080 Escherichia coli proteins, respectively.
(1) Shifman, M.A., Li, Y., Colangelo, C.M., Stone, K.L., Wu, T.L., Cheung, K.H., Miller, P.L., Williams, K.R. YPED: a web-accessible database system for protein expression analysis. (2007) Journal of Proteome Research 6(10): 4019-24 (PMCID:PMC3863627).
(2) Colangelo, C.M., Shifman, M., Cheung, K.H., Stone, K.L., Carriero, N.J., Gulcicek, E.E., Lam, T.T., Wu, T., Bjornson, R.D., Bruce, C., Nairn, A.C., Rinehart, J., Miller, P.L., Williams, K.R. (2015) YPED: an integrated bioinformatics suite and database for mass spectrometry-based proteomics research. Genomics Proteomics Bioinformatics 13(1): 25-35 (PMCID: PMC4411476).