Revisão Acesso aberto Revisado por pares

Modification Site Localization Scoring: Strategies and Performance

2012; Elsevier BV; Volume: 11; Issue: 5 Linguagem: Inglês

10.1074/mcp.r111.015305

ISSN

1535-9484

Autores

Robert J. Chalkley, Karl R. Clauser,

Tópico(s)

Genomics and Phylogenetic Studies

Resumo

Using enrichment strategies many research groups are routinely producing large data sets of post-translationally modified peptides for proteomic analysis using tandem mass spectrometry. Although search engines are relatively effective at identifying these peptides with a defined measure of reliability, their localization of site/s of modification is often arbitrary and unreliable. The field continues to be in need of a widely accepted metric for false localization rate that accurately describes the certainty of site localization in published data sets and allows for consistent measurement of differences in performance of emerging scoring algorithms. In this article are discussed the main strategies currently used by software for modification site localization and ways of assessing the performance of these different tools. Methods for representing ambiguity are reviewed and a discussion of how the approaches transfer to different data types and modifications is presented. Using enrichment strategies many research groups are routinely producing large data sets of post-translationally modified peptides for proteomic analysis using tandem mass spectrometry. Although search engines are relatively effective at identifying these peptides with a defined measure of reliability, their localization of site/s of modification is often arbitrary and unreliable. The field continues to be in need of a widely accepted metric for false localization rate that accurately describes the certainty of site localization in published data sets and allows for consistent measurement of differences in performance of emerging scoring algorithms. In this article are discussed the main strategies currently used by software for modification site localization and ways of assessing the performance of these different tools. Methods for representing ambiguity are reviewed and a discussion of how the approaches transfer to different data types and modifications is presented. Cells respond to elements in their extracellular milieu via a variety of signaling mechanisms, which may be generated in response to environmental cues, electrical stimuli, or chemical messengers. Mostly the signals are propagated by alteration of pre-existing proteins through the addition of post-translational modifications (PTMs) 1The abbreviations used are:PTMpost-translational modificationFDRfalse discovery rateFLRfalse localization rateABRFassociation of biomolecular resource facilitiesiPRGproteome informatics research groupSLIP scoresite localization in peptide scoreCIDcollision induced dissociationECDelectron capture dissociationETDelectron transfer dissociation. 1The abbreviations used are:PTMpost-translational modificationFDRfalse discovery rateFLRfalse localization rateABRFassociation of biomolecular resource facilitiesiPRGproteome informatics research groupSLIP scoresite localization in peptide scoreCIDcollision induced dissociationECDelectron capture dissociationETDelectron transfer dissociation. to key residues to change their structure and activity. These signals are then relayed and amplified until they ultimately manifest in changes in protein expression. It is increasingly clear that mis-regulation of PTMs is a major basis of disease, whether it causes aberrant signaling in cancer, or cytotoxic protein aggregation in neurodegenerative diseases. For this reason, the study of protein post-translational modifications is vital to understanding biological regulation (1Pawson T. Scott J.D. Protein phosphorylation in signaling–50 years and counting.Trends Biochem. Sci. 2005; 30: 286-290Abstract Full Text Full Text PDF PubMed Scopus (498) Google Scholar). post-translational modification false discovery rate false localization rate association of biomolecular resource facilities proteome informatics research group site localization in peptide score collision induced dissociation electron capture dissociation electron transfer dissociation. post-translational modification false discovery rate false localization rate association of biomolecular resource facilities proteome informatics research group site localization in peptide score collision induced dissociation electron capture dissociation electron transfer dissociation. Mass spectrometry-based proteomics has revolutionized the characterization of protein PTMs, in that it has created the first unbiased strategies to identify which proteins are being modified, with what types of modifications, and on which specific residue/s. Of the many types of PTMs used, protein phosphorylation has justifiably received the greatest attention (2Boersema P.J. Mohammed S. Heck A.J. Phosphopeptide fragmentation and analysis by mass spectrometry.J. Mass Spectrom. 2009; 44: 861-878Crossref PubMed Scopus (289) Google Scholar, 3Grimsrud P.A. Swaney D.L. Wenger C.D. Beauchene N.A. Coon J.J. Phosphoproteomics for the masses.ACS Chem. Biol. 2010; 5: 105-119Crossref PubMed Scopus (141) Google Scholar). However, there are many other regulatory modifications such as serine and threonine O-GlcNAcylation (4Hart G.W. Housley M.P. Slawson C. Cycling of O-linked beta-N-acetylglucosamine on nucleocytoplasmic proteins.Nature. 2007; 446: 1017-1022Crossref PubMed Scopus (1077) Google Scholar), lysine or arginine methylation (5Rathert P. Dhayalan A. Ma H. Jeltsch A. Specificity of protein lysine methyltransferases and methods for detection of lysine methylation of non-histone proteins.Mol. Biosyst. 2008; 4: 1186-1190Crossref PubMed Scopus (29) Google Scholar), acetylation of lysine side-chains (6Zhao S. Xu W. Jiang W. Yu W. Lin Y. Zhang T. Yao J. Zhou L. Zeng Y. Li H. Li Y. Shi J. An W. Hancock S.M. He F. Qin L. Chin J. Yang P. Chen X. Lei Q. Xiong Y. Guan K.L. Regulation of cellular metabolism by protein lysine acetylation.Science. 2010; 327: 1000-1004Crossref PubMed Scopus (1458) Google Scholar), or lysine ubiquitination (7Kirkpatrick D.S. Denison C. Gygi S.P. Weighing in on ubiquitin: the expanding role of mass-spectrometry-based proteomics.Nat. Cell Biol. 2005; 7: 750-757Crossref PubMed Scopus (187) Google Scholar) that are all transient and important to study and understand. There are many other PTMs used by the cell, both transient and stable, that affect protein activity. Indeed, different PTMs do not operate independently of each other, so by studying only a single modification type it is impossible to deconvolute their contributions to signaling mechanisms and cellular responses (8Strahl B.D. Allis C.D. The language of covalent histone modifications.Nature. 2000; 403: 41-45Crossref PubMed Scopus (6584) Google Scholar, 9Zeidan Q. Hart G.W. The intersections between O-GlcNAcylation and phosphorylation: implications for multiple signaling pathways.J. Cell Sci. 2010; 123: 13-22Crossref PubMed Scopus (240) Google Scholar). The large-scale analyses of all of these modifications follow a similar strategy, in which an enrichment step (antibody affinity, metal affinity, lectin affinity) is followed by tandem mass spectrometric analysis of the resulting mixture (10Zhao Y. Jensen O.N. Modification-specific proteomics: strategies for characterization of post-translational modifications using enrichment techniques.Proteomics. 2009; 9: 4632-4641Crossref PubMed Scopus (277) Google Scholar). Thousands of modified peptides may be reliably identified in these studies, but the challenge of determining modification site localizations among these peptides is less well addressed. An MS/MS spectrum enables modification site localization by presence in the spectrum of one or more ions of unambiguously assignable ion type that are derived from fragmentation between two amino acids in the peptide that can bear the modification. When such an ion is not present, or the ion cannot be readily distinguished from noise, the modification cannot be confidently localized to either site. When only one amino acid in the peptide can bear the modification there can be no ambiguity as to the localization. Fig. 1 illustrates two spectra of phosphopeptides; one in which a site assignment is unambiguous and one in which there is no evidence to distinguish between two potential sites. The above rules make assessment of site localization reliability seem straightforward, and indeed it is for some spectra, but it can be complicated by deciding what is a real peak and what is noise; whether peaks that are suggestive of a modification site (e.g. a peak that could correspond to either a phosphate loss from a modified fragment or water loss from an unmodified: the former is more commonly seen) should be given any weighting; and deciding which residues in a peptide should be considered as potentially bearing a given modification; e.g. for a methylation whether one also considers potential amino acid substitutions that would lead to the same mass change. For peptide identification two statistical measures of reliability are routinely determined for results. The first of these is a measure of reliability for individual peptide identifications and is usually a measure of how likely a given quality of match will have been achieved by chance; i.e. the smaller this number (probability or expectation value) the more reliable the identification. The second reliability measure, a false discovery rate (FDR), is at the data set level, and estimates the total number of incorrect results being reported. The FDR is normally calculated by a target-decoy database searching strategy in which an assumption is made that the frequency of matching to the normal database and decoy database will be the same (11Elias J.E. Gygi S.P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry.Nat. Methods. 2007; 4: 207-214Crossref PubMed Scopus (2827) Google Scholar). Unfortunately, no equivalent reliability measure; i.e. a false localization rate (FLR), can be easily calculated for modification site localization results, although approaches to estimate this have been employed and are discussed in a later section. The only situation in which a true FLR can be measured is when the correct modification site localizations are known. The most obvious situation in which this is the case is through the use of synthetic peptides (12Savitski M.M. Lemeer S. Boesche M. Lang M. Mathieson T. Bantscheff M. Kuster B. Confident phosphorylation site localization using the Mascot Delta Score.Mol. Cell Proteomics. 2011; Abstract Full Text Full Text PDF Scopus (222) Google Scholar), but another strategy that has been employed was to assess decoy residue site localizations in a phosphopeptide subset of data in which there was only one serine, threonine or tyrosine present per peptide (13Baker P.R. Trinidad J.C. Chalkley R.J. Modification site localization scoring integrated into a search engine.Mol. Cell Proteomics. 2011; (M111.008078)Abstract Full Text Full Text PDF Scopus (92) Google Scholar). Although all search engines have a score that measures the certainty of peptide identification, only a few currently have integrated an additional score to measure the reliability of modification site localization (13Baker P.R. Trinidad J.C. Chalkley R.J. Modification site localization scoring integrated into a search engine.Mol. Cell Proteomics. 2011; (M111.008078)Abstract Full Text Full Text PDF Scopus (92) Google Scholar, 14Cox J. Neuhauser N. Michalski A. Scheltema R.A. Olsen J.V. Mann M. Andromeda: a peptide search engine integrated into the MaxQuant environment.J. Proteome Res. 2011; 10: 1794-1805Crossref PubMed Scopus (3448) Google Scholar, 15Albuquerque C.P. Smolka M.B. Payne S.H. Bafna V. Eng J. Zhou H. A multidimensional chromatography technology for in-depth phosphoproteome analysis.Mol. Cell Proteomics. 2008; 7: 1389-1396Abstract Full Text Full Text PDF PubMed Scopus (405) Google Scholar, 16Spectrum Mill - Agilent Technologies Inc.; Available from: http://www.chem.agilent.com/en-US/Products/software/chromatography/ms/spectrummillformasshunterworkstation/pages/default.aspxGoogle Scholar). In the meantime, a variety of different post-search engine tools have emerged to address modification site localization (12Savitski M.M. Lemeer S. Boesche M. Lang M. Mathieson T. Bantscheff M. Kuster B. Confident phosphorylation site localization using the Mascot Delta Score.Mol. Cell Proteomics. 2011; Abstract Full Text Full Text PDF Scopus (222) Google Scholar, 17Beausoleil S.A. Villén J. Gerber S.A. Rush J. Gygi S.P. A probability-based approach for high-throughput protein phosphorylation analysis and site localization.Nat. Biotechnol. 2006; 24: 1285-1292Crossref PubMed Scopus (1205) Google Scholar, 18Olsen J.V. Blagoev B. Gnad F. Macek B. Kumar C. Mortensen P. Mann M. Global, in vivo, and site-specific phosphorylation dynamics in signaling networks.Cell. 2006; 127: 635-648Abstract Full Text Full Text PDF PubMed Scopus (2807) Google Scholar, 19Ruttenberg B.E. Pisitkun T. Knepper M.A. Hoffert J.D. PhosphoScore: an open-source phosphorylation site assignment tool for MSn data.J. Proteome Res. 2008; 7: 3054-3059Crossref PubMed Scopus (79) Google Scholar, 20Bailey C.M. Sweet S.M. Cunningham D.L. Zeller M. Heath J.K. Cooper H.J. SLoMo: automated site localization of modifications from ETD/ECD mass spectra.J. Proteome Res. 2009; 8: 1965-1971Crossref PubMed Scopus (81) Google Scholar, 21Swaney D.L. Wenger C.D. Thomson J.A. Coon J.J. Human embryonic stem cell phosphoproteome revealed by electron transfer dissociation tandem mass spectrometry.Proc. Natl. Acad. Sci. U.S.A. 2009; 106: 995-1000Crossref PubMed Scopus (169) Google Scholar, 22Taus T. Köcher T. Pichler P. Paschke C. Schmidt A. Henrich C. Mechtler K. Universal and confident phosphorylation site localization using phosphoRS.J. Proteome Res. 2011; 10: 5354-5362Crossref PubMed Scopus (568) Google Scholar, 23Edwards N. Wu X. Tseng C.W. An unsupervised, model-free, machine-learning combiner for peptide identifications from tandem mass spectra.Clin. Proteomics. 2009; 5: 23-36Crossref Scopus (47) Google Scholar). Prior to the availability of site localization scoring programs the only option for assessing the site localization reported was "manual verification"; i.e. the researcher looking at each spectrum in turn and making a judgment on whether they deemed the site localization reliable. This is obviously a subjective process, heavily reliant on the expertise of the researcher (and their patience to thoroughly assess as many as a thousand spectra). Indeed, prior to journal publication guidelines forcing researchers to assess site localization reliability (24Bradshaw R.A. Burlingame A.L. Carr S. Aebersold R. Reporting protein identification data: the next generation of guidelines.Mol. Cell Proteomics. 2006; 5: 787-788Abstract Full Text Full Text PDF PubMed Scopus (203) Google Scholar) there were many PTM studies published in which this question was not even addressed; i.e. they reported results as returned by the relevant search engine with no further analysis. The effect of this was the listing of some site localizations that upon closer inspection of the data should not have been reported. This would be bad enough in itself, but PTM databases have extracted these results to populate their resources and it is not immediately apparent from these databases which site localizations were determined based on these generally less stringent standards. This is discussed in more detail in a later section of this manuscript. When the proteome informatics research group (iPRG) of the Association of Biomolecular Resource Facilities (ABRF) conducted a study in 2010 on identifying phosphopeptides and localizing phosphorylation sites, the 22 participants who attempted to assess modification site localization reported the use of nine named pieces of software and a further six custom or in-house tools. (25Rudnick, P. A., Askenazi, M., Clauser, K. R., Lane, W. S., Martens, L., McDonald, W. H., Mertins, P., Meyer-Arendt, K., Searle, B. C., Kowalak, J. A., Proteome Informatics Research Group 2010 Study. Available from: http://www.abrf.org/index.cfm/group.show/ProteomicsInformaticsResearchGroup.53.htmGoogle Scholar) Despite the range of tools deployed, most implement one of two basic strategies. These two main strategies for scoring site localization reliability either try to assess the chance of a given peak that allows site determination to have been matched at random, or calculate a search engine score difference between peptide identifications with different site localizations. Tools employing the former strategy are A-Score (17Beausoleil S.A. Villén J. Gerber S.A. Rush J. Gygi S.P. A probability-based approach for high-throughput protein phosphorylation analysis and site localization.Nat. Biotechnol. 2006; 24: 1285-1292Crossref PubMed Scopus (1205) Google Scholar), PTM Score (MaxQuant/Andromeda) (18Olsen J.V. Blagoev B. Gnad F. Macek B. Kumar C. Mortensen P. Mann M. Global, in vivo, and site-specific phosphorylation dynamics in signaling networks.Cell. 2006; 127: 635-648Abstract Full Text Full Text PDF PubMed Scopus (2807) Google Scholar), the Phosphorylation Localization Score (PLS) in Inspect (15Albuquerque C.P. Smolka M.B. Payne S.H. Bafna V. Eng J. Zhou H. A multidimensional chromatography technology for in-depth phosphoproteome analysis.Mol. Cell Proteomics. 2008; 7: 1389-1396Abstract Full Text Full Text PDF PubMed Scopus (405) Google Scholar), SLoMo (20Bailey C.M. Sweet S.M. Cunningham D.L. Zeller M. Heath J.K. Cooper H.J. SLoMo: automated site localization of modifications from ETD/ECD mass spectra.J. Proteome Res. 2009; 8: 1965-1971Crossref PubMed Scopus (81) Google Scholar), Phosphinator (26Phanstiel D.H. Brumbaugh J. Wenger C.D. Tian S. Probasco M.D. Bailey D.J. Swaney D.L. Tervo M.A. Bolin J.M. Ruotti V. Stewart R. Thomson J.A. Coon J.J. Proteomic and phosphoproteomic comparison of human ES and iPS cells.Nat. Methods. 2011; 8: 821-827Crossref PubMed Scopus (217) Google Scholar), PhosphoRS (22Taus T. Köcher T. Pichler P. Paschke C. Schmidt A. Henrich C. Mechtler K. Universal and confident phosphorylation site localization using phosphoRS.J. Proteome Res. 2011; 10: 5354-5362Crossref PubMed Scopus (568) Google Scholar), whereas examples of the latter strategy include Mascot Delta Score (12Savitski M.M. Lemeer S. Boesche M. Lang M. Mathieson T. Bantscheff M. Kuster B. Confident phosphorylation site localization using the Mascot Delta Score.Mol. Cell Proteomics. 2011; Abstract Full Text Full Text PDF Scopus (222) Google Scholar), the SLIP score in Protein Prospector (13Baker P.R. Trinidad J.C. Chalkley R.J. Modification site localization scoring integrated into a search engine.Mol. Cell Proteomics. 2011; (M111.008078)Abstract Full Text Full Text PDF Scopus (92) Google Scholar), and the variable modification localization (VML) score in Spectrum Mill[Agilent]. PepArML (27PepArML. Available from: https://edwardslab.bmcb.georgetown.edu/pymsio/.Google Scholar) provides experimental site localization scores in its results using an approach related to the latter strategy, in which site localization scores are calculated for each post-translational modification observed on a peptide by summing together the confidence scores of its peptide identifications and normalizing by the total PepArML confidence associated with the peptide (analogous to how PTM Score converts their probability estimates into site scores) (N. Edwards, personal communication). Table I summarizes features of some of these tools, and the remaining sections of this article will contrast these different site localization strategies, compare their performance and discuss their applicability for data types other than phosphorylation data acquired in low resolution ion traps, which have been the main data type assessed so far using these tools.Table IComparison of some of the leading modification site localization scoring toolsSoftwarePeak PickingScoring: Peak Probability (PP) or Difference Score (DS)Score uses High Mass Accuracy?Score Has Same Meaning for High Mass Accuracy Data?Representing AmbiguityA-Scoren peaks per 100 ThPPNYReports best scoring site; does not indicate next best alternativePTM Score Andromedan peaks per 100 ThPPNYReports probability score for all sitesMascot Delta Scoren peaks per 110 ThDSNA given score becomes more reliable for higher mass accuracy dataReports best scoring site; does not indicate next best alternativeSLIP Score Protein Prospector20 most intense peaks in each half of observed m/z rangeDSYSimilarLists all potential sites within score thresholdVML Score Spectrum Mill25 peaks with the highest signal/noise following removal of isotopes, precursor, and noiseDSYSimilarLists all potential sites within score threshold Open table in a new tab An often under-appreciated crucial step in both peptide identification and modification site localization is deciding which masses in a peak list file to use for analysis. Within the peak list there will be a mixture of masses that are fragment ions from the peptide of interest, and other masses that are produced by chemical or electrical noise. As the noise peaks are generally of lower intensity, the decision of which peaks to use is normally on the basis of intensity. Most tools employ an intensity threshold, in which only peaks above this are considered, although an intensity-based cross-correlation approach similar to that employed by the search engine Sequest has also been employed (19Ruttenberg B.E. Pisitkun T. Knepper M.A. Hoffert J.D. PhosphoScore: an open-source phosphorylation site assignment tool for MSn data.J. Proteome Res. 2008; 7: 3054-3059Crossref PubMed Scopus (79) Google Scholar). Different methods are employed by tools for thresholding. The simplest approach is to employ a universal intensity threshold across the whole spectrum. However, not all fragment ion peaks encode the same amount of information. Sequence ions (for CID, b and y ions) are more informative than internal ions or immonium ions, and ions that are formed by fragmentation nearer the middle of the peptide are more information-rich, in that they are defining the mass of a longer stretch of amino acids. For these reasons, a peak thresholding strategy that ensures peak representation over a wide m/z range can lead to better sensitivity in peptide identification and site localization. Batch-Tag in Protein Prospector splits the observed m/z range in half to create a peaklist for searching with the same number of peaks listed in each half of the m/z range (typically the 20 most intense peaks in each half to create a 40 m/z peak list). A third strategy is to split the spectrum into m/z bins, then use the "n " most intense peaks within each bin for searching. This is the strategy employed by A-Score (100 Th bins), PTM Score (100 Th bins), and Mascot (110 Th bins). This strategy guarantees peak representation throughout the m/z range, but will lead to low intensity peaks being used in "quiet " parts of the spectrum. This is exemplified in Fig. 2, which plots peak lists generated using the 20 + 20 and 4 per 100 Th strategies for a spectrum of the phosphopeptide RGT(Phospho)VEGSVQEVQEEK. In each strategy there are a similar number of peaks in the list (40 peaks versus 42 peaks, respectively). However, there are several peaks unique to each approach, including some that correspond to b and y ions of the correct peptide identification. In this example each peak list contains one peak that can be used to localize the phosphorylation to the threonine residue, although it is a different peak in each example; b4 ion the 20 + 20 peak list and y10 in the 4 per 100 peak list. The Protein Prospector peak list creation strategy has been compared with the 100 Th binning strategy on the same data set (13Baker P.R. Trinidad J.C. Chalkley R.J. Modification site localization scoring integrated into a search engine.Mol. Cell Proteomics. 2011; (M111.008078)Abstract Full Text Full Text PDF Scopus (92) Google Scholar). The differences were modest in comparison to a 4 peaks per 100 Th strategy, but the 20 + 20 peak list led to slightly more correct and slightly fewer incorrect site localizations, thereby returning more correct results at a lower FLR. Considering more peaks (5 peaks per 100 Th) produced a measurably higher FLR, mainly because of reporting incorrect site localizations for spectra in which using the other peak list generation strategies the site localizations were deemed ambiguous. The first two significant site localization scoring tools, A-Score and PTM Score, use similar approaches to score assignments. In both cases the tools were developed for assessing phosphorylation site identifications in low mass-accuracy ion trap CID data. Both tools treat observed peaks as integer masses, which as this type of data is typically measured with an m/z accuracy of roughly ± 0.5 Th is not an unreasonable step to take. They then make the assumption that if every peak mass is equally likely to be observed at random, then if e.g. four peaks are considered per 100 Th, the chances of randomly matching one of these peaks is 4 in 100. After calculating their probabilities, they both then convert the values into –10log10(p) scores. The differences between A-Score and PTM Score are that PTM Score uses this approach to calculate probability scores for the peptide identification as a whole with each possible site localization, then converts these scores into probabilities for each potential site localization by making the sum of all of the scores equal a probability of 100% (as the peptide is definitely modified somewhere) and allocating probabilities to each site in the peptide on this normalized probability scale (18Olsen J.V. Blagoev B. Gnad F. Macek B. Kumar C. Mortensen P. Mann M. Global, in vivo, and site-specific phosphorylation dynamics in signaling networks.Cell. 2006; 127: 635-648Abstract Full Text Full Text PDF PubMed Scopus (2807) Google Scholar). In contrast, A-Score calculates their probability score only based on matching potential "site-determining " b and y ions; i.e. peaks that would be the same for all site localizations are ignored. Instead of reporting site localization scores for all sites, it only reports a score for the best site localization, which is a difference score between this site localization and the next best possibility (17Beausoleil S.A. Villén J. Gerber S.A. Rush J. Gygi S.P. A probability-based approach for high-throughput protein phosphorylation analysis and site localization.Nat. Biotechnol. 2006; 24: 1285-1292Crossref PubMed Scopus (1205) Google Scholar). In the original publication of AScore, the authors proposed using a threshold score of 19, which would mathematically correspond to a probability P of ∼0.01 or a site being localized with 99% certainty (17Beausoleil S.A. Villén J. Gerber S.A. Rush J. Gygi S.P. A probability-based approach for high-throughput protein phosphorylation analysis and site localization.Nat. Biotechnol. 2006; 24: 1285-1292Crossref PubMed Scopus (1205) Google Scholar). Later phosphoproteomic papers from the authors' laboratory often use a more liberal threshold (p < 0.05, AScore > 13) (28Huttlin E.L. Jedrychowski M.P. Elias J.E. Goswami T. Rad R. Beausoleil S.A. Villén J. Haas W. Sowa M.E. Gygi S.P. A tissue-specific atlas of mouse protein phosphorylation and expression.Cell. 2010; 143: 1174-1189Abstract Full Text Full Text PDF PubMed Scopus (1214) Google Scholar). If one tests a range of values into the AScore probability equation, it is clear that achieving a score >13 requires the best localization to have at least two more site determining ions matched in the MS/MS spectrum regardless of the number of amino acids separating two candidate localization sites. The accuracy of the probabilities calculated by these methods is dependent on the assumptions used to fit MS/MS data into the binomial probability model. One of the unrealistic assumptions is that all masses are equally likely to be observed at random. Amino acids use a limited range of elements and several only differ by a methyl group from each other, so some masses are in practice much more likely to be observed than others. For example, a peak at mass 201 could be a b2 ion formed by combining the amino acids EA, LS, IS or TV. On the other hand a peak at m/z 206 cannot be formed by any amino acid combination. However, as in each case they are comparing or normalizing to other site localization scores this may mitigate this issue. Nevertheless, instead of treating these probability-based scores as accurate measures of site-localization certainty, they should be considered more simply as just scores subject to a threshold determined by the user to suit individual certainty objectives. In contrast, the scoring in PhosphoRS attempts to overcome the issue of peak distribution across the mass range of the spectrum by replacing the core probability calculation of N peaks per 100 Th with (N x d)/w, where N is the total number of extracted peaks, d is the specified fragment ion mass tolerance, and w is the full mass range of the MS/MS spectrum (22Taus T. Köcher T. Pichler P. Paschke C. Schmidt A. Henrich C. Mechtler K. Universal and confident phosphorylation site localization using phosphoRS.J. Proteome Res. 2011; 10: 5354-5362Crossref PubMed Scopus (568) Google Scholar). Not only does this probability adjustment allow for different regions of an MS/MS spectrum to contain vastly different numbers of peaks with different optimal peak depths for distinct m/z windows, but also directly allows for the use of narrow mass tolerances appropriate for data generated on high resolution instruments. A database search engine is going to consider all of the potential site localizations in a peptide when performing peptide identification. Hence, there is information in the search results that can provide a measure of site localization reliabil

Referência(s)