The Yale Protein Expression Database (YPED)
We have developed an integrated web-accessible software system called the Yale Protein Expression Database or YPED (1), to address the storage, retrieval, and integrated analysis of high throughput proteomic analyses. YPED handles all types of proteomic analyses including small molecule, LC-MS/MS protein identifications, and identification/quantitation results from Protein Profiling experiments such as DIGE, iTRAQ, ICAT, and SILAC experiments. YPED also contains a repository for public access of protein identification experimental data which is utilized when a paper is published from the data. Sample descriptions are compatible with the evolving MIAPE standards (2). For DIGE experiments, the system associates the DIGE gel spots and image, analyzed with DeCyderTM, with mass spectrometric protein identifications from selected gel spots. The spots are analyzed using MALDI-TOF/TOF with protein identifications performed by Mascot in conjunction with the AB GPS Explorer system. Researchers can view, subset and download their data through a secure Web interface, which includes a table containing proteins identified, a sample summary, the sample description, and clickable gel image for DIGE samples. An example of DIGE results in YPED is shown below. Tools are available to facilitate sample comparison, viewing of phosphoproteins, calculation of the mass defect of phosphopeptides (and therefore the probability that the peptide is a phosphopeptide) and links to the PANTHER Classification System.
For Targeted Proteomics, reports of peptide concentrations are listed and a synthetic peptide database of AQUA peptides utilized in the MRM analysis is maintained.
Users Log into YPED with a unique logon and password which allows them to view their proteomics and small molecule data. Clicking on the link in the table will pull up the data results for the sample of interest.
Apply for a YPED account
- Shifman, M.A., Li, Y., Colangelo, C.M., Stone, K.L., Wu, T.L., Cheung, K., Miller, P.L., Williams, K.R. (2007) YPED: A Web-Accessible Database System for Protein Expression Analysis., J. Proteome Research, 6, 4019-4024.
- Taylor, C. F., N. W. Paton, K. S. Lilley, P.-A. Binz, R. K. Julian, A. R. Jones, W. Zhu, R. Apweiler, R. Aebersold, E. W. Deutsch, M. J. Dunn, A. J. R. Heck, A. Leitner, M. Macht, M. Mann, L. Martens, T. A. Neubert, S. D. Patterson, P. Ping, S. L. Seymour, P. Souda, A. Tsugita, J. Vandekerckhove, T. M. Vondriska, J. P. Whitelegge, M. R. Wilkins, I. Xenarios, J. R. Yates and H. Hermjakob (2007). "The minimum information about a proteomics experiment (MIAPE)." Nat Biotech 25(8): 887-893.
Description of YPED Parameters
- The top box contains information on the sample: the search engine, version of the search engine, search title, database searched and MS file name.
- The protein threshold score: Protein scores are derived from ions scores as a non-probabilistic basis for ranking protein hits. (see http://www.matrixscience.com/help/scoring_help.html for details on the scoring)
- Score: The protein score in a Peptide Summary is derived from the ions scores. For a search that contains a small number of queries, the protein score is the sum of the unique ions scores. That is, excluding the scores for duplicate matches. A small correction is applied to reduce the contribution of low-scoring random matches. This correction is a function of the total number of molecular mass matches for each query and the width of the peptide tolerance window. This correction is usually very small, except in no enzyme searches
- Decoy database search: During the normal search, every time a protein sequence from the target database is tested, a random sequence of the same length is automatically generated and tested. The average amino acid composition of the random sequences is the same as the average composition of the target database. The matches and scores for the random sequences are recorded separately in the result file. When the search is complete, the statistics for matches to the random sequences, which are effectively sequences from a decoy database, are reported in the result header.
- Expectation: Expectation value for the peptide match. (The number of times we would expect to obtain an equal or higher score, purely by chance. The lower this value, the more significant the result).
- % coverage: is the coverage of the known protein that was identified by peptide matches
PEPTIDES
- m/z: is the observed mass in the mass spectra. This might be singly, doubly, triply etc. charged. The charge is listed in the last column
- Score: is the peptide score
- Ion mass: is the mass determined from the m/z and the charge state
- Ion mass calculated is the mass of the peptide form the theoretical sequence
- Delta is the mass difference between the Ion mass in the spectra and the calculated ion mass
- Ppm: is the parts per million determined for the peptide match (base don’t he Delta value). In the LTQ Orbitrap, this should be better than ~ 5 ppm
- Peptides > or less than the identity threshold:
- The identity threshold is calculated from the number of trials If there are 5000 precursor matches, a 1 in a 20 chance of getting a false positive match is a probability of P = 1 / (20 x 5000) which is a score of S = -10LogP = 50
In Mascot, the score for an MS/MS match is based on the absolute probability (P) that the observed match between the experimental data and the database sequence is a random event. The reported score is -10Log(P). So, during a search, if 1.5 x 10^5 peptides fell within the mass tolerance window about the precursor mass, and the significance threshold was chosen to be 0.05, (a 1 in 20 chance of a false positive), this would translate into a score threshold of 65.