Revisão Acesso aberto Revisado por pares

Building and Searching Tandem Mass Spectral Libraries for Peptide Identification

2011; Elsevier BV; Volume: 10; Issue: 12 Linguagem: Inglês

10.1074/mcp.r111.008565

ISSN

1535-9484

Autores

Henry Lam,

Tópico(s)

Metabolomics and Mass Spectrometry Studies

Resumo

Spectral library searching is an emerging approach in peptide identifications from tandem mass spectra, a critical step in proteomic data analysis. Conceptually, the premise of this approach is that the tandem MS fragmentation pattern of a peptide under some fixed conditions is a reproducible fingerprint of that peptide, such that unknown spectra acquired under the same conditions can be identified by spectral matching. In actual practice, a spectral library is first meticulously compiled from a large collection of previously observed and identified tandem MS spectra, usually obtained from shotgun proteomics experiments of complex mixtures. Then, a query spectrum is then identified by spectral matching using recently developed spectral search engines. This review discusses the basic principles of the two pillars of this approach: spectral library construction, and spectral library searching. An overview of the software tools available for these two tasks, as well as a high-level description of the underlying algorithms, will be given. Finally, several new methods that utilize spectral libraries for peptide identification in ways other than straightforward spectral matching will also be described. Spectral library searching is an emerging approach in peptide identifications from tandem mass spectra, a critical step in proteomic data analysis. Conceptually, the premise of this approach is that the tandem MS fragmentation pattern of a peptide under some fixed conditions is a reproducible fingerprint of that peptide, such that unknown spectra acquired under the same conditions can be identified by spectral matching. In actual practice, a spectral library is first meticulously compiled from a large collection of previously observed and identified tandem MS spectra, usually obtained from shotgun proteomics experiments of complex mixtures. Then, a query spectrum is then identified by spectral matching using recently developed spectral search engines. This review discusses the basic principles of the two pillars of this approach: spectral library construction, and spectral library searching. An overview of the software tools available for these two tasks, as well as a high-level description of the underlying algorithms, will be given. Finally, several new methods that utilize spectral libraries for peptide identification in ways other than straightforward spectral matching will also be described. In the past decade and a half, mass spectrometry-based proteomics has witnessed breathtaking advances. Today, many top research universities and institutes are equipped with proteomics facilities, and the ability to detect and quantify a large number of proteins in a high-throughput manner is having a positive and growing impact in life science research. Among the many proposed experimental workflows, the most widely practiced method is probably the “bottom-up” approach of shotgun proteomics. The key steps of this approach are: (1) proteins are digested into shorter peptides that are more amenable to liquid chromatography (LC)-MS analysis, (2) peptides are further fragmented by tandem mass spectrometry (MS/MS) 1The abbreviations used are:MS/MStandem MSNISTNational Institute of Science of Technology. to yield characteristic fragmentation patterns, and (3) the MS/MS spectra are assigned to their originating peptides by various computational methods (1Aebersold R. Mann M. Mass spectrometry-based proteomics.Nature. 2003; 422: 198-207Crossref PubMed Scopus (5602) Google Scholar, 2Domon B. Aebersold R. Mass spectrometry and protein analysis.Science. 2006; 312: 212-217Crossref PubMed Scopus (1621) Google Scholar, 3Washburn M.P. Wolters D. Yates 3rd, J.R. Large-scale analysis of the yeast proteome by multidimensional protein identification technology.Nat. Biotechnol. 2001; 19: 242-247Crossref PubMed Scopus (4091) Google Scholar, 4Steen H. Mann M. The ABC's (and XYZ's) of peptide sequencing.Nat. Rev. Mol. Cell Biol. 2004; 5: 699-711Crossref PubMed Scopus (841) Google Scholar). tandem MS National Institute of Science of Technology. This last step of assigning MS/MS spectra to their peptide identifications is often the rate-limiting step of the whole proteomics experiment, and has received well-deserved attention over the past decade. These computational methods can be generally classified into three groups, in terms of the “search space,” i.e. the set of candidate sequences to consider as possible answers (Fig. 1). On one end of the search space scale are de novo sequencing methods (5Dancik V. Addona T.A. Clauser K.R. Vath J.E. Pevzner P.A. De novo peptide sequencing via tandem mass spectrometry.J. Comput. Biol. 1999; 6: 327-342Crossref PubMed Scopus (485) Google Scholar, 6Ma B. Zhang K. Hendrie C. Liang C. Li M. Doherty-Kirby A. Lajoie G. PEAKS: Powerful software for peptide de novo sequencing by tandem mass spectrometry.Rapid Commun. Mass Spectrom. 2003; 17: 2337-2342Crossref PubMed Scopus (973) Google Scholar, 7Frank A. Pevzner P. PepNovo: de novo peptide sequencing via probabilistic network modeling.Anal. Chem. 2005; 77: 964-973Crossref PubMed Scopus (526) Google Scholar, 8Pitzer E. Masselot A. Colinge J. Assessing peptide de novo sequencing algorithms performance on large and diverse data sets.Proteomics. 2007; 7: 3051-3054Crossref PubMed Scopus (31) Google Scholar), which made no initial assumption on what peptides might be present in the sample. Instead the algorithms consider exhaustively all permutations of the 20 amino acids as viable candidates, and try to infer the sequence directly from the MS/MS spectra. In the middle of the search space scale are sequence database searching methods (9Eng J.K. McCormack A.L. Yates 3rd, J.R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database.J. Am. Soc. Mass Spectrom. 1994; 5: 976-989Crossref PubMed Scopus (5444) Google Scholar, 10Perkins D.N. Pappin D.J.C. Creasy D.M. Cottrell J.S. Probability-based protein identification by searching sequence database using mass spectrometry data.Electrophoresis. 1999; 20: 3551-3567Crossref PubMed Scopus (6776) Google Scholar, 11Craig R. Beavis R.C. TANDEM: matching proteins with tandem mass spectra.Bioinformatics. 2004; 20: 1466-1467Crossref PubMed Scopus (1991) Google Scholar, 12Geer L.Y. Markey S.P. Kowalak J.A. Wagner L. Xu M. Maynard D.M. Yang X. Shi W. Bryant S.H. Open mass spectrometry search algorithm.J. Proteome Res. 2004; 3: 958-964Crossref PubMed Scopus (1167) Google Scholar, 13MacCoss M.J. Computational analysis of shotgun proteomics data.Cur. Opin. Chem. Biol. 2005; 9: 88-94Crossref PubMed Scopus (50) Google Scholar), which rely on available sequence databases to limit the search space to only peptides that are derivable from known protein sequences. In these methods, each candidate peptide sequence is mapped to a list of expected fragment ions based on some simple rules of peptide fragmentation, to which the query (unknown) spectrum is compared. Aided by the rapid advances in genome sequencing and gene prediction that produces protein sequence databases for many model organisms, sequence database searching has become the method of choice for most proteomics researchers, despite its great demand for computational power. Toward the other end of the search space scale is spectral library searching (14Yates 3rd, J.R. Morgan S.F. Gatlin C.L. Griffin P.R. Eng J.K. Method to compare collision-induced dissociation spectra of peptides: Potential for library searching and subtractive analysis.Anal. Chem. 1998; 70: 3557-3565Crossref PubMed Scopus (160) Google Scholar, 15Craig R. Cortens J.C. Fenyo D. Beavis R.C. Using annotated peptide mass spectrum libraries for protein identification.J. Proteome Res. 2006; 5: 1843-1849Crossref PubMed Scopus (247) Google Scholar, 16Frewen B.E. Merrihew G.E. Wu C.C. Noble W.S. MacCoss M.J. Analysis of peptide MS/MS spectra from large-scale proteomics experiments using spectrum libraries.Anal. Chem. 2006; 78: 5678-5684Crossref PubMed Scopus (198) Google Scholar, 17Lam H. Deutsch E.W. Eddes J.S. Eng J.K. King N. Stein S.E. Aebersold R. Development and validation of a spectral library searching method for peptide identification from MS/MS.Proteomics. 2007; 7: 655-667Crossref PubMed Scopus (399) Google Scholar). In spectral library searching, the search space is further restricted to only those peptides that have been previously detected and identified, and for which their fragmentation patterns have been experimentally recorded and compiled into spectral libraries. Spectral library searching is essentially a straightforward spectral matching exercise, and can be orders-of-magnitude faster than the other approaches because of its much reduced search space. This latter approach is the subject of this review. Spectral library searching is relatively new in proteomics, but has a long history in the mass spectrometric analysis of small molecules. The widely used NIST/NIH/EPA mass spectral library (http://www.nist.gov/srd/nist1a.cfm), developed by the National Institute of Science of Technology (NIST), contains over 200,000 mass spectra of mostly small organic molecules (18Domokos L. Hennberg D. Weimann B. Computer-aided identification of compounds by comparison of mass spectra.Anal. Chim. Acta. 1984; 165: 61-74Crossref Scopus (23) Google Scholar, 19Stein S.E. Scott D.R. Optimization and testing of mass spectral library search algorithms for compound identification.J. Am. Soc. Mass Spectrom. 1994; 5: 859-886Crossref PubMed Scopus (572) Google Scholar, 20Owens K.G. Application of correlation analysis techniques to mass spectral data.Appl. Spectrosc. Rev. 1992; 27: 1-49Crossref Scopus (50) Google Scholar). It was only in 1999, however, that the concept of spectral library searching was introduced to proteomics, in the work of Yates et al. (14Yates 3rd, J.R. Morgan S.F. Gatlin C.L. Griffin P.R. Eng J.K. Method to compare collision-induced dissociation spectra of peptides: Potential for library searching and subtractive analysis.Anal. Chem. 1998; 70: 3557-3565Crossref PubMed Scopus (160) Google Scholar), which demonstrated that peptide MS/MS spectra are reproducible enough for this approach to be effective. However, at the time mass spectrometers were slow, proteomics data was scarce, and automatic data analysis methods were in their infancy. There was no conceivable way to build comprehensive spectral libraries for use in spectral searching. As a result, this elegant idea failed to catch on until 2006, when several groups published spectral searching methods: X!Hunter (15Craig R. Cortens J.C. Fenyo D. Beavis R.C. Using annotated peptide mass spectrum libraries for protein identification.J. Proteome Res. 2006; 5: 1843-1849Crossref PubMed Scopus (247) Google Scholar) and Bibliospec (16Frewen B.E. Merrihew G.E. Wu C.C. Noble W.S. MacCoss M.J. Analysis of peptide MS/MS spectra from large-scale proteomics experiments using spectrum libraries.Anal. Chem. 2006; 78: 5678-5684Crossref PubMed Scopus (198) Google Scholar) almost simultaneously, and SpectraST (17Lam H. Deutsch E.W. Eddes J.S. Eng J.K. King N. Stein S.E. Aebersold R. Development and validation of a spectral library searching method for peptide identification from MS/MS.Proteomics. 2007; 7: 655-667Crossref PubMed Scopus (399) Google Scholar) a few months later. By then, the technological platform of shotgun proteomics and sequence searching methods had become more mature and widely used, leading to the rapid accumulation of MS/MS data. NIST began in 2006 to extend their mass spectral library to include peptides, and that effort continues in earnest today. At this moment, nearly 1 million reference spectra of peptides in 18 libraries of different organisms and biological samples have been compiled and made freely available as the NIST Libraries of Peptide Tandem Mass Spectra (http://peptide.nist.gov/). Conceptually, the premise of spectral library searching is very simple: that the fragmentation pattern of a molecule under some fixed conditions is a reproducible fingerprint of that molecule, such that unknown spectra acquired under the same conditions can be identified by spectral matching. Granted, it is true that in practice, spectra will inevitably contain experimental artifacts (e.g. random noise and signals from contaminants), or the fragmentation conditions might not be exactly the same. But very much like fingerprinting in forensic science, imperfect matches do not necessarily preclude correct identification, because the fingerprint typically contains far more information than is necessary to distinguish a significant match from a spurious one. In fact, in spectral library searching, all the features of a reference spectrum, including peak intensities and the presence of minor ions, are used, and similarity is more globally and precisely determined (Fig. 2). This is in contrast to sequence searching, which usually assumes nothing about peak intensities and ignores all noncanonical ions, primarily because of the difficulty in predicting these features for each candidate peptide in a sequence-specific manner. The effectiveness of this approach depends on (1) high-quality reference spectra, with good signal-to-noise ratios and devoid of impurities, and (2) effective matching algorithms with the robustness and flexibility to accommodate imperfect matches while minimizing false matches. The former is about constructing spectral libraries, and the latter, searching them. In the following sections, these two pillars of spectral library searching will be discussed in detail, and the progress made in the field over the past 5 years will be reviewed. Spectral libraries are nothing more than searchable collections of identified spectra. Nonetheless, the conceptual simplicity of this idea belies the complexity of the actual library building process for proteomics applications. The unique difficulty of proteomics is the enormous variety of naturally occurring peptides, which makes it impractical to synthesize purified peptides to generate reference mass spectra for the entire proteome. Instead, the practical approach is to collect spectra from complex mixtures, such as bodily fluids and cell lysates, in typical shotgun proteomics experiments, and identify them by sequence database searching. Library building, therefore, must first start from the tedious and often error-prone procedure of assigning MS/MS spectra to peptide identifications, and must cope with all the well-known pitfalls and limitations of this process. Nonetheless, for most researchers who are interested in adopting spectral library searching in their data analysis, it suffices to know that there are already high-quality and free-of-charge spectral libraries for proteomics applications that one can download with a click of a button. Since 2006, there have been centralized efforts to build peptide MS/MS spectral libraries, most notably by NIST and also by others (15Craig R. Cortens J.C. Fenyo D. Beavis R.C. Using annotated peptide mass spectrum libraries for protein identification.J. Proteome Res. 2006; 5: 1843-1849Crossref PubMed Scopus (247) Google Scholar, 16Frewen B.E. Merrihew G.E. Wu C.C. Noble W.S. MacCoss M.J. Analysis of peptide MS/MS spectra from large-scale proteomics experiments using spectrum libraries.Anal. Chem. 2006; 78: 5678-5684Crossref PubMed Scopus (198) Google Scholar, 21Deutsch E.W. Lam H. Aebersold R. PeptideAtlas: a resource for target selection for emerging targeted proteomics workflows.EMPO Rep. 2008; 9: 429-434Crossref PubMed Scopus (445) Google Scholar). These endeavors seek to collect data from many laboratories and a wide variety of instrument platforms, so as to maximize the coverage and quality of the libraries. In this endeavor the emerging data repositories such as PeptideAtlas (21Deutsch E.W. Lam H. Aebersold R. PeptideAtlas: a resource for target selection for emerging targeted proteomics workflows.EMPO Rep. 2008; 9: 429-434Crossref PubMed Scopus (445) Google Scholar), PRIDE (22Martens L. Hermjakob H. Jones P. Adamski M. Taylor C. States D. Gevaert K. Vandekerckhove J. Apweiler R. PRIDE: the proteomics identifications database.Proteomics. 2005; 5: 3537-3545Crossref PubMed Scopus (437) Google Scholar), and Tranche (23Hill J.A. Smith B.E. Papoulias P.G. Andrews P.C. ProteomeCommons.org collaborative annotation and project management resource integrated with the Tranche repository.J. Proteome Res. 2010; 9: 2809-2811Crossref PubMed Scopus (26) Google Scholar), have played a key enabling role. The data thus collected is often pushed through state-of-the-art data analysis pipelines to get the most out of the data, and to maintain a consistent standard for identification accuracy. These methods are constantly evolving and usually involve using multiple search engines, validation by error modeling or decoy searching, as well as various postsearch quality filters. NIST, for example, uses no fewer than four different sequence search engines, as well as additional independent postsearch filters in building their spectral libraries. Currently, however, the publicly available libraries only cover a handful of model organisms, contain mostly ion-trap collision-induced dissociation (CID) spectra, and only include modified peptides with the most common amino acid modifications, such as methionine oxidation and N-terminal acetylation. (For more information please see: http://peptide.nist.gov/). Alternatively, researchers can also build spectral libraries for their own data, especially when no suitable public spectral libraries are available for that particular biological system. It is worth noting that a custom-built spectral library is a concise summary of the individual research group's observed proteomes of interest, and building a spectral library can also be viewed as a means of data storage and organization. In the process of building a spectral library, the spectrum is reconnected with its identification, redundancy is reduced, unidentified spectra are discarded, and relevant meta-data about the observed peptides can be aggregated. The cumbersome raw data is converted to a form that can be indexed for meaningful retrieval, and through spectral library searching, past and future observations of the same peptide are automatically linked (24Lam H. Deutsch E.W. Eddes J.S. Eng J.K. Stein S.E. Aebersold R. Building consensus spectral libraries for peptide identification in proteomics.Nat. Methods. 2008; 5: 873-875Crossref PubMed Scopus (214) Google Scholar). Currently, the spectral search engines X!Hunter, Bibliospec, and SpectraST all provide functionalities for building spectral libraries from sequence search results, whether in separate scripts (for X!Hunter and Bibliospec) or as options integrated into the same program (for SpectraST). All three software packages provide detailed documentation and instructions, and the interested reader is referred to the respective websites of these projects for more information (Table I).Table IUseful websites for spectral library building and searching toolsNISTMS SearchSoftware and library download, instructions• http://peptide.nist.gov/X!HunterSoftware download• ftp://ftp.thegpm.org/projects/xhunter/binariesLibrary download• ftp://ftp.thegpm.org/projects/xhunter/libsInstructions• http://h201.thegpm.org/docs/xhunter_system.htmlWeb client to X!Hunter on remote server• http://xhunter.thegpm.org/BibliospecSoftware download• http://depts.washington.edu/ventures/UW_Technology/Express_Licenses/bibliospec.phpLibrary download and instructions• http://proteome.gs.washington.edu/software/bibliospec/documentation/SpectraSTSoftware download• http://sourceforge.net/projects/sashimi/files/ (SpectraST is part of TPP)Library download• http://www.peptideatlas/speclib/• http://peptide.nist.gov/Instructions• http://tools.proteomecenter.org/wiki/index.php?title = SpectraSTWeb client to SpectraST on remote server• http://www.peptideatlas.org/spectrast/ Open table in a new tab The actual process of building a spectral library from experimentally collected MS/MS spectra can be roughly divided into 5 steps (Fig. 3). First, the spectra are analyzed by traditional peptide identification tools, most commonly sequence search engines. Second, some form of statistical validation is performed to screen for confident identifications, according to some predefined standard for identification accuracy. Third, the spectra and their respective identifications must be retrieved from various files and linked together, and entries from as many data sets as possible are combined. This includes the mundane but tedious task of integrating data and search results from different locations and different formats. Fourth, once the “raw” library is built, spectra assigned to the same peptide ion identification (termed replicates) are merged to produce a single, representative “consensus” spectrum for that peptide, thereby reducing redundancy and improving search speed. An alternative and simpler approach is to select the “best” replicate among all to represent the peptide ion. The fifth and final step is quality control. This refers to the process by which incorrectly identified or noisy spectra are selectively removed from the library. The last two steps deserve some elaboration, as their importance is often overlooked. The step of consensus creation, beyond its obvious role in reducing redundancy, actually has a large impact on the effectiveness of subsequent spectral searching. A good consensus algorithm increases the signal-to-noise ratio of the resulting consensus spectrum, by taking advantage of the fact that noise, by definition, does not appear consistently across replicates, but signals should be conserved. Thus consensus spectrum creation is somewhat analogous to the practice of taking multiple measurements of a physical quantity, and reporting an average that evens out noisy fluctuations of individual measurements. It has been shown that the consensus approach is better than the “best-replicate” approach for reducing redundancy, and that the more replicates that go into forming the consensus, the higher the signal-to-noise ratio of the consensus is, up to a certain saturation limit (24Lam H. Deutsch E.W. Eddes J.S. Eng J.K. Stein S.E. Aebersold R. Building consensus spectral libraries for peptide identification in proteomics.Nat. Methods. 2008; 5: 873-875Crossref PubMed Scopus (214) Google Scholar). In terms of actual implementation, the consensus algorithm must also deal effectively with cases where replicates are too dissimilar to each other (perhaps because of ineffective fragmentation, contamination, or false identification in some replicates), and where some replicates are much more noisy than others. To achieve some robustness against the wide variety of spectra that it might encounter, SpectraST (which implements a similar algorithm as NIST's) attempts to detect problematic replicates and remove them before merging, and weighs the replicates by quality rather than taking a straight average. It also employs a “peak voting” method whereby only peaks consistently present in a majority of replicates are admitted into the consensus, further reducing the chance of retaining noise or impurity peaks. For peptide ions that are only observed once, the consensus approach is unavailable, but some spectrum cleaning steps can still be taken to reduce noise, before such “singleton” spectra are admitted into the spectral library. The consensus spectral library generated in this manner may still have occasional bad apples that need to be thrown out to ensure the eventual success of spectral searching. There are two types of spectra that are unwelcome in spectral libraries: (1) incorrectly identified spectra, and (2) extremely noisy or heavily contaminated spectra. The former are products of fallible sequence search engines, and will propagate errors in spectral searching if allowed in spectral libraries. The latter not only cause false positives because of nonspecific matches, but also lead to a higher background in similarity scores and thereby reduce the discrimination power of the search engine, again because of their propensity to form indiscriminate partial matches. It has been shown that the detection and removal of these undesirable spectra from the spectral libraries contributes to greater sensitivity of the spectral search (24Lam H. Deutsch E.W. Eddes J.S. Eng J.K. Stein S.E. Aebersold R. Building consensus spectral libraries for peptide identification in proteomics.Nat. Methods. 2008; 5: 873-875Crossref PubMed Scopus (214) Google Scholar). There are two major mechanisms for filtering spectra for quality control. The first is no different from ordinary statistical validation of sequence search results. Namely, one attempts to reduce false positives and bad spectra by setting appropriate thresholds on sequence search scores. In this regard, it is worth noting that library building generally involves much greater amounts of data, most of which repeated samplings of the same proteome, than typical proteomics experiments. A library builder must therefore be keenly aware of the problem of accumulating false positives as data volume increases (25Reiter L. Claassen M. Schrimpf S.P. Jovanovic M. Schmidt A. Buhmann J.M. Hengartner M.O. Aebersold R. Protein identification false discovery rates for very large proteomics data sets generated by tandem mass spectrometry.Mol. Cell. Proteomics. 2009; 8: 2405-2417Abstract Full Text Full Text PDF PubMed Scopus (252) Google Scholar). To ensure that the library does not accumulate false positives, much more conservative score cutoffs must be applied than is customary in proteomics experiments. NIST actually takes it one step further and throws away all “one-hit wonders,” i.e. peptide ions that have been identified only once among tens of millions of spectra. SpectraST implements user-defined options that can selectively remove all “one-hit wonders” or only those that cannot be confirmed by another identification (e.g. of the same sequence but a different charge state) in the library. Although it has been demonstrated that many of the “one-hit wonders” are in fact correct (25Reiter L. Claassen M. Schrimpf S.P. Jovanovic M. Schmidt A. Buhmann J.M. Hengartner M.O. Aebersold R. Protein identification false discovery rates for very large proteomics data sets generated by tandem mass spectrometry.Mol. Cell. Proteomics. 2009; 8: 2405-2417Abstract Full Text Full Text PDF PubMed Scopus (252) Google Scholar), it is not unreasonable to take an exceedingly conservative position, and sacrifice some coverage for the sake of minimizing errors in the library, especially if the library is intended for public use. Throwing away “one-hit wonders” also has the benefit of ensuring all spectra in the library are consensus spectra. The second method of library quality control is to filter based on the properties of the spectra themselves. As mentioned before, noisy or contaminated spectra, even correctly identified ones, are undesirable in spectral libraries. In this regard, there is a rich literature on quality assessment of MS/MS spectra that a library builder can make use of (26Gupta N. Pevzner P.A. False discovery rates of protein identifications: a strike against the two-peptide rule.J. Proteome Res. 2009; 8: 4173-4181Crossref PubMed Scopus (147) Google Scholar, 27Salmi J. Nyman T.A. Nevalainen O.S. Aittokallio T. Filtering strategies for improving protein identification in high-throughput MS/MS studies.Proteomics. 2009; 9: 848-860Crossref PubMed Scopus (29) Google Scholar, 28Renard B.Y. Kirchner M. Monigatti F. Ivanov A.R. Rappsilber J. Winter D. Steen J.A. Hamprecht F.A. Steen H. When less can yield more - Computational preprocessing of MS/MS spectra for peptide identification.Proteomics. 2009; 9: 4978-4984Crossref PubMed Scopus (62) Google Scholar). However, most quality assessment tools are intended for filtering spectra prior to searching, whereas in library quality control, the spectra are already identified. Therefore the identification can be used to help determine if the spectrum is noisy or contains a dominant impurity. SpectraST, for example, tries to annotate all peaks in a spectrum to plausible fragment ions of the peptide, considering a wide range of possibilities including uncommon neutral losses. Then a filter can be set such that any spectrum containing too many unexplained peaks or too high a fraction of unexplained signals will be removed. Lastly, for a library to be useful as a living resource, various meta-data about the library spectra should also be stored. This includes information about the sample sources, the search engines used to identify them, and measures of confidence for the identification. For large libraries built from many data sets, this information need to be aggregated and summarized meaningfully as replicates from different sources are merged. The measure of confidence is also important as a means to convey uncertainty about the spectrum's identification, such that the spectral search engine can take this into account as it assigns confidence to an identification made by spectral matching (17Lam H. Deutsch E.W. Eddes J.S. Eng J.K. King N. Stein S.E. Aebersold R. Development and validation of a spectral library searching method for peptide identification from MS/MS.Proteomics. 2007; 7: 655-667Crossref PubMed Scopus (399) Google Scholar). Several spectral search engines designed for proteomics applications have been developed in the past 5 years. In this section the focus is on the traditional, more well-established tools that perform straightforward spectral matching; newer methods that use libraries for peptide identification in some other ways are discussed in a later section. The intention here is to first briefly describe each engine

Referência(s)