Artigo Acesso aberto Revisado por pares

A Simulated MS/MS Library for Spectrum-to-spectrum Searching in Large Scale Identification of Proteins

2008; Elsevier BV; Volume: 8; Issue: 4 Linguagem: Inglês

10.1074/mcp.m800384-mcp200

ISSN

1535-9484

Autores

Chia-Yu Yen, Karen Meyer-Arendt, Brian Eichelberger, S. Sun, Stéphane Houel, William M. Old, Rob Knight, Natalie G. Ahn, Lawrence Hunter, Katheryn A. Resing,

Tópico(s)

Metabolomics and Mass Spectrometry Studies

Resumo

Identifying peptides from mass spectrometric fragmentation data (MS/MS spectra) using search strategies that map protein sequences to spectra is computationally expensive. An alternative strategy uses direct spectrum-to-spectrum matching against a reference library of previously observed MS/MS that has the advantage of evaluating matches using fragment ion intensities and other ion types than the simple set normally used. However, this approach is limited by the small sizes of the available peptide MS/MS libraries and the inability to evaluate the rate of false assignments. In this study, we observed good performance of simulated spectra generated by the kinetic model implemented in MassAnalyzer (Zhang, Z. (2004) Prediction of low-energy collision-induced dissociation spectra of peptides. Anal. Chem. 76, 3908–3922; Zhang, Z. (2005) Prediction of low-energy collision-induced dissociation spectra of peptides with three or more charges. Anal. Chem. 77, 6364–6373) as a substitute for the reference libraries used by the spectrum-to-spectrum search programs X!Hunter and BiblioSpec and similar results in comparison with the spectrum-to-sequence program Mascot. We also demonstrate the use of simulated spectra for searching against decoy sequences to estimate false discovery rates. Although we found lower score discrimination with spectrum-to-spectrum searches than with Mascot, particularly for higher charge forms, comparable peptide assignments with low false discovery rate were achieved by examining consensus between X!Hunter and Mascot, filtering results by mass accuracy, and ignoring score thresholds. Protein identification results are comparable to those achieved when evaluating consensus between Sequest and Mascot. Run times with large scale data sets using X!Hunter with the simulated spectral library are 7 times faster than Mascot and 80 times faster than Sequest with the human International Protein Index (IPI) database. We conclude that simulated spectral libraries greatly expand the search space available for spectrum-to-spectrum searching while enabling principled analyses and that the approach can be used in consensus strategies for large scale studies while reducing search times. Identifying peptides from mass spectrometric fragmentation data (MS/MS spectra) using search strategies that map protein sequences to spectra is computationally expensive. An alternative strategy uses direct spectrum-to-spectrum matching against a reference library of previously observed MS/MS that has the advantage of evaluating matches using fragment ion intensities and other ion types than the simple set normally used. However, this approach is limited by the small sizes of the available peptide MS/MS libraries and the inability to evaluate the rate of false assignments. In this study, we observed good performance of simulated spectra generated by the kinetic model implemented in MassAnalyzer (Zhang, Z. (2004) Prediction of low-energy collision-induced dissociation spectra of peptides. Anal. Chem. 76, 3908–3922; Zhang, Z. (2005) Prediction of low-energy collision-induced dissociation spectra of peptides with three or more charges. Anal. Chem. 77, 6364–6373) as a substitute for the reference libraries used by the spectrum-to-spectrum search programs X!Hunter and BiblioSpec and similar results in comparison with the spectrum-to-sequence program Mascot. We also demonstrate the use of simulated spectra for searching against decoy sequences to estimate false discovery rates. Although we found lower score discrimination with spectrum-to-spectrum searches than with Mascot, particularly for higher charge forms, comparable peptide assignments with low false discovery rate were achieved by examining consensus between X!Hunter and Mascot, filtering results by mass accuracy, and ignoring score thresholds. Protein identification results are comparable to those achieved when evaluating consensus between Sequest and Mascot. Run times with large scale data sets using X!Hunter with the simulated spectral library are 7 times faster than Mascot and 80 times faster than Sequest with the human International Protein Index (IPI) database. We conclude that simulated spectral libraries greatly expand the search space available for spectrum-to-spectrum searching while enabling principled analyses and that the approach can be used in consensus strategies for large scale studies while reducing search times. Identification of proteins in complex samples is a major new area in bioinformatics. The most successful method currently available is shotgun proteomics where proteins are proteolyzed into peptides (usually by trypsin) followed by large scale sequencing of peptides by on-line chromatographic separation and fragmentation in a mass spectrometer (LC-MS/MS). The fragmentation process generates spectra (referred to as MS/MS spectra) from which peptide sequences consistent with the observed fragment ions can be identified (1Aebersold R. Mann M. Mass spectrometry-based proteomics.Nature. 2003; 422: 198-207Crossref PubMed Scopus (5484) Google Scholar). A common computational strategy for matching an MS/MS spectrum to a peptide sequence involves interconverting spectral and sequence information (2Sadygov R.G. Cociorva D. Yates III, J.R. Large-scale database searching using tandem mass spectra: looking up the answer in the back of the book.Nat. Methods. 2004; 1: 195-202Crossref PubMed Scopus (327) Google Scholar). Spectrum-to-sequence database search programs match peptide sequences to spectra in one of two ways: by 1) extracting sequence information from an observed spectrum and matching the sequences against peptides contained in a protein database or 2) converting peptide sequences from the protein database into simple spectra (e.g. predicting a subset of possible b and y fragment ions generated by peptide bond cleavage) and matching the predicted fragment ions to those observed. Various scoring methods are then used to evaluate overlap between observed and predicted fragments, including use of probability functions or spectral similarity metrics (2Sadygov R.G. Cociorva D. Yates III, J.R. Large-scale database searching using tandem mass spectra: looking up the answer in the back of the book.Nat. Methods. 2004; 1: 195-202Crossref PubMed Scopus (327) Google Scholar). An alternative strategy involves direct spectrum-to-spectrum matching of experimental spectra against reference MS/MS in a spectral library (3Stein S.E. Scott D.R. Optimization and testing of mass spectra library search algorithms for compound identification.J. Am. Soc. Mass Spectrom. 1994; 5: 859-866Crossref PubMed Scopus (535) Google Scholar). Programs that search sequence databases and spectral libraries are similar in many ways. Both match an experimental spectrum by selecting candidates from a reference database, use preprocessing and filtering functions to simplify the matching, and rank candidates using scores that evaluate the ability of the candidates to account for the observed fragment ions. However, spectrum-to-spectrum matching more easily allows use of fragment ion intensities as well as information on fragments other than the major b and y ions. Furthermore spectral library searching is simpler conceptually and faster to execute (4Craig R. Cortens J.C. Fenyo D. Beavis R.C. Using annotated peptide mass spectrum libraries for protein identification.J. Proteome Res. 2006; 5: 1843-1849Crossref PubMed Scopus (240) Google Scholar) because it is unnecessary to interconvert between spectra and sequences during the scoring process. For this reason, reference libraries of peptide MS/MS derived from experimental data sets are actively under development with libraries of human proteins now available from the National Institute of Standard and Technology (223,793 spectra), the MacCoss laboratory (320,658 spectra) (5Frewen B.E. Merrihew G.E. Wu C.C. Noble W.S. MacCoss M.J. Analysis of peptide MS/MS spectra from large-scale proteomics experiments using spectrum libraries.Anal. Chem. 2006; 78: 5678-5684Crossref PubMed Scopus (193) Google Scholar), and the Beavis laboratory (297,519 spectra) (4Craig R. Cortens J.C. Fenyo D. Beavis R.C. Using annotated peptide mass spectrum libraries for protein identification.J. Proteome Res. 2006; 5: 1843-1849Crossref PubMed Scopus (240) Google Scholar). All contain mainly tryptic peptides but also include a significant number of non-tryptic and covalently modified peptides. One limitation of spectrum-to-spectrum matching using MS/MS libraries derived from observed spectra is that all possible peptide sequences are not represented. In the human database, tryptic products alone predict ∼3,300,000 spectra in the mass range detectable by MS when different charge forms are included. 1The number was based on an in silico generated peptide database. There are a total of 2,918,714 peptide sequences that were at least 9 amino acids long. Of these, 1,452,058 passed the missed cleavage rules of Yen et al. (14Yen C.-Y. Russell S. Mendoza A.M. Meyer-Arendt K. Sun S. Cios K.J. Ahn N.G. Resing K.A. Improving sensitivity in shotgun proteomics using a peptide-centric database with reduced complexity: protease cleavage and SCX elution rules from data mining of MS/MS spectra.Anal. Chem. 2006; 78: 1071-1084Crossref PubMed Scopus (66) Google Scholar). The simulated spectra are then generated for charges 1–3. 1The number was based on an in silico generated peptide database. There are a total of 2,918,714 peptide sequences that were at least 9 amino acids long. Of these, 1,452,058 passed the missed cleavage rules of Yen et al. (14Yen C.-Y. Russell S. Mendoza A.M. Meyer-Arendt K. Sun S. Cios K.J. Ahn N.G. Resing K.A. Improving sensitivity in shotgun proteomics using a peptide-centric database with reduced complexity: protease cleavage and SCX elution rules from data mining of MS/MS spectra.Anal. Chem. 2006; 78: 1071-1084Crossref PubMed Scopus (66) Google Scholar). The simulated spectra are then generated for charges 1–3. Thus, the available reference libraries contain only a small fraction of all potentially observable spectra for tryptic peptides. Spectra are absent because they are rarely sampled experimentally, are derived from proteins of low abundance, are found in rare forms (e.g. alternative splice products), and/or ionize with low efficiency. Thus, only a few spectra might be available in a library to identify a protein, which is problematic due to the fact that shotgun proteomics depends on peptide sampling. Furthermore it is difficult to compare performances of spectrum-to-spectrum matching with spectrum-to-sequence strategies because of large differences in the sizes of their database search spaces and the unpredictable representation of a protein in the libraries. In this study, we hypothesized that spectrum-to-spectrum searching can be improved by using a spectral library composed of simulated spectra for all tryptic peptides in the human database. Simulated spectra were generated using a kinetic model, which simulates fragment ion intensities based on known mechanisms for peptide fragmentation (implemented in the MassAnalyzer program developed by Z. Zhang) (6Zhang Z. Prediction of low-energy collision-induced dissociation spectra of peptides.Anal. Chem. 2004; 76: 3908-3922Crossref PubMed Scopus (249) Google Scholar, 7Zhang Z. Prediction of low-energy collision-induced dissociation spectra of peptides with three or more charges.Anal. Chem. 2005; 77: 6364-6373Crossref PubMed Scopus (111) Google Scholar). The MassAnalyzer MS/MS simulator was generated by fitting kinetic parameters to known peptide gas phase chemistries based on similarity scores to a large library of previously identified MS/MS spectra. The simulator satisfactorily predicts most cleavage patterns noted in manual analysis, such as enhanced fragmentation near Pro, Asp, Glu, His, and Ile/Leu/Val and near proton-donating side chains for peptide ions when proton mobility is limited. It also models common neutral losses, internal fragment ions, and losses of the C-terminal residue and provides a simple modeling of charge distribution between the various products, which is an important factor in simulating the spectra of MH+, MH22+, and MH33+ charge states. Results from Zhang (6Zhang Z. Prediction of low-energy collision-induced dissociation spectra of peptides.Anal. Chem. 2004; 76: 3908-3922Crossref PubMed Scopus (249) Google Scholar) showed that the simulated spectra show good discrimination when scored by similarity to peptide standards. We have further shown that the spectra can be used to evaluate chemical plausibility in a program designed to mimic manual analysis of MS/MS spectra (8Sun S. Meyer-Arendt K. Eichelberger B. Brown R. Yen C.-Y. Old W.M. Pierce K. Cios K.J. Ahn N.G. Resing K.A. Improved validation of peptide MS/MS assignments using spectral intensity prediction.Mol. Cell. Proteomics. 2007; 6: 1-17Abstract Full Text Full Text PDF PubMed Scopus (42) Google Scholar) because it provides a way to use fragment ion intensities in evaluating candidate sequences. Rescoring Mascot or Sequest assignments based on similarity to simulated spectra yielded improved discrimination with large scale data sets and enabled validation of hits with low scores from the search program, thus identifying a large class of correct assignments that were normally rejected using conventional score thresholds (8Sun S. Meyer-Arendt K. Eichelberger B. Brown R. Yen C.-Y. Old W.M. Pierce K. Cios K.J. Ahn N.G. Resing K.A. Improved validation of peptide MS/MS assignments using spectral intensity prediction.Mol. Cell. Proteomics. 2007; 6: 1-17Abstract Full Text Full Text PDF PubMed Scopus (42) Google Scholar). Here we develop methods for using a library of MassAnalyzer-generated simulated spectra of peptides from human proteins for spectrum-to-spectrum matching. The simulated spectral library is 10 times larger than the current reference libraries, providing a search space comparable to that used by spectrum-to-sequence searching of protein databases. Methods were developed to partition this library to accommodate the memory limits of the computer and operating system and to manage searches of multiple files. This approach also allows generation of randomized or inverted sequence libraries for target-decoy searches (9Elias J.E. Gygi S.P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry.Nat. Methods. 2007; 4: 207-214Crossref PubMed Scopus (2726) Google Scholar) to apply principled methods to evaluate the significance of peptide matches. Use of simulated spectra for spectrum-to-spectrum searching improves performance over available reference libraries and provides more rapid searching of large scale data sets. LC-MS/MS sequencing of proteins was performed on a Thermo LCQ Classic mass spectrometer interfaced with an Agilent Cap1100 HPLC instrument (15 cm × 250-μm inner diameter, Jupiter C18, Phenomenex) (10Resing K.A. Meyer-Arendt K. Mendoza A.M. Aveline-Wolf L.D. Jonscher K.R. Pierce K.G. Old W.M. Cheung H.T. Russel S. Wattawa J.L. Goehle G.R. Knight R.D. Ahn N.G. Improving reproducibility and sensitivity in identifying human proteins by shotgun proteomics.Anal. Chem. 2004; 76: 3556-3568Crossref PubMed Scopus (203) Google Scholar) or a Thermo LTQ-Orbitrap mass spectrometer interfaced with an Eksigent NanoLC-2D HPLC instrument (10 cm × 75-μm inner diameter, Zorbax C18, Agilent). Three data sets were used in this study. 1) The “LCQ data set” contained 845 manually curated and validated tryptic peptide assignments derived from a data set of 4,051 MS/MS spectra collected on a trypsinized soluble protein extract from human K562 erythroleukemia cells using an LCQ Classic ion trap mass spectrometer as described previously (10Resing K.A. Meyer-Arendt K. Mendoza A.M. Aveline-Wolf L.D. Jonscher K.R. Pierce K.G. Old W.M. Cheung H.T. Russel S. Wattawa J.L. Goehle G.R. Knight R.D. Ahn N.G. Improving reproducibility and sensitivity in identifying human proteins by shotgun proteomics.Anal. Chem. 2004; 76: 3556-3568Crossref PubMed Scopus (203) Google Scholar). 2) The “ABRF 2The abbreviations used are: ABRF, Association of Biomolecular Resource Facilities; FDR, false discovery rate (FP/(TP + FP)); FP, false positive; TP, true positive; TN, true negative; SS, simulated spectra; xRef, spectral library provided by X!Hunter; xSS, simulated spectral library for X!Hunter; bSS, simulated spectral library for BiblioSpec; SM, Sequest and Mascot consensus; XM, X!Hunter and Mascot consensus; CPU, central processing unit; RAM, random access memory; IPI, International Protein Index; MGF, Mascot generic format; ROC, receiver-operator characteristic. data set” contained 5,854 MS/MS spectra collected on a tryptic digest of the ABRF standard mixture of 49 proteins (Sigma-Aldrich, UPS1) collected on an LTQ-Orbitrap. A digest of 200 fmol of total protein was loaded, and peptides were eluted with a gradient of 2–40% acetonitrile in 0.1% formic acid, water over 150 min. MS/MS were collected enabling monoisotopic precursor and charge selection settings. Each MS scan was followed by five LTQ MS/MS scans targeting the top five most intense ions with a dynamic exclusion of 180 s and a repeat count of 2. The maximum injection time for Orbitrap parent scans was 500 ms with two microscans and an automatic gain control of 1 × 106. The maximum injection time for the LTQ MS/MS was 250 ms with three microscans and an automatic gain control of 1 × 104. The normalized collision energy was 35% with activation Q of 0.25 for 30 ms. 3) The “large scale data set” contained 90,411 MS/MS spectra collected on tryptic peptides derived from cytosolic protein extracts of WM115 human melanoma cells fractionated by quaternary aminoethyl anion exchange (Mono Q) fast protein liquid chromatography. LC-MS/MS data collection was carried out on each fraction using an LTQ-Orbitrap as above but with reverse phase elution from 2 to 40% acetonitrile (0.1% formic acid) in 120 min, running each sample three times and scanning different mass ranges (350–708, 700–1108, and 1100–1600 Da). Spectrum-to-spectrum search programs used in this study were X!Hunter, which was developed by Beavis and co-workers (4Craig R. Cortens J.C. Fenyo D. Beavis R.C. Using annotated peptide mass spectrum libraries for protein identification.J. Proteome Res. 2006; 5: 1843-1849Crossref PubMed Scopus (240) Google Scholar), and BiblioSpec, which was developed by MacCoss and co-workers (5Frewen B.E. Merrihew G.E. Wu C.C. Noble W.S. MacCoss M.J. Analysis of peptide MS/MS spectra from large-scale proteomics experiments using spectrum libraries.Anal. Chem. 2006; 78: 5678-5684Crossref PubMed Scopus (193) Google Scholar). X!Hunter v.Win32 July 1, 2007 was tested on an Intel Pentium 4 3.2-GHz CPU (hyperthread) with 2-GB RAM using Windows XP Professional, and BiblioSpec v.1.0 was tested on a Dual Intel Xeon 2.4-GHz CPU with 2-GB RAM using Linux (Fedora Core 4). Results were compared with two spectrum-to-sequence search programs: Mascot v.2.2 (11Papping D.J.C. Horjup P. Bleasby A.J. Rapid identification of proteins by peptide-mass fingerprinting.Curr. Biol. 1993; 3: 327-332Abstract Full Text PDF PubMed Scopus (1407) Google Scholar), run on a Dual Intel Xeon 2.8-GHz CPU (hyperthread) with 8-GB RAM and Windows 2003, and TurboSequest (v.27 revision 12) (12Eng J.K. McCormack A.L. Yates III, J.R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database.J. Am. Soc. Mass Spectrom. 1994; 5: 976-989Crossref PubMed Scopus (5314) Google Scholar) run on an Intel Pentium 4 3.0-GHz CPU (hyperthread) with 1-GB RAM using Windows XP Professional. Error tolerances were ±1.2 Da for the parent ion using Mascot or X!Hunter and ±0.8 Da for fragment ions using Mascot (optimized for sensitivity and discrimination) or ±0.5 Da for fragment ions using X!Hunter (this default value is in the code and is not controlled by the user setting in the parameter file). In BiblioSpec, the fragment ion mass tolerance is controlled by fragment ion binning during spectral preprocessing, producing mass tolerances varying with m/z that cannot be altered by the user. BiblioSpec also bases parent ion mass tolerances on m/z value, and tolerances used for simulated spectral library searching were ±1.2, ±0.6, and ±0.4 Da, respectively, for MH+, MH22+, and MH33+ ions. Mascot and X!Hunter convert experimentally observed parent ion m/z to MH+ (M is the uncharged peptide mass) and then bases mass tolerance on that value, which produces varying tolerances for different charge forms. Therefore, we partitioned the simulated spectral library by charge and set different tolerances for each charge form to compare results between these programs. In addition to mass tolerances, we allowed up to two missed cleavages and fixed carbamidomethyl-Cys modification for protein database searches. For spectral library searches, there is no parameter to exclude other modifications; therefore, those MS/MS cases are removed by postfiltering. Scoring methods utilized default functions: Mascot reports a Mowse score (also called ion score), BiblioSpec reports a dot product similarity score (Σ(Ii × Ij)/[(ΣIi2)1/2 × (ΣIj2)1/2]) between reference library and experimental spectra, and X!Hunter converts the weighted dot product score into an expectation score and then adds probability-based scoring to report a modified expectation score. To make comparisons equivalent, some functions in the expectation scoring were modified or turned off. In the X!Hunter reference library derived from observed spectra, each entry is assigned an initial expectation value as part of the final expectation score calculation. Because this information was not available for the simulated spectra, a default value of 0.001 was used for searches with both the observed and simulated spectral libraries. In addition, the partitioning of the input data sets (described below) required turning off a scoring correction for the number of total MS/MS analyzed. These changes had no effect on the search results (not shown), although the use of 0.001 as the initial expectation value shifted the range of the final expectation scores below a cutoff value, which varied between data sets. Reference libraries of observed spectra for X!Hunter and BiblioSpec were downloaded from the developers’ Web sites. The simulated spectral libraries were based on the human IPI protein database v.3.29 (13Kersey P.J. Durate J. Williams A. Karavidopoulou Y. Birney E. Apweiler R. The international protein index: an integrated database for proteomics experiments.Proteomics. 2004; 4: 1985-1988Crossref PubMed Scopus (630) Google Scholar) or a “decoy sequence database” where each protein sequence was read in reverse. A database of sequences was generated for tryptic peptides with mass between 900 and 4500 Da, allowing up to two missed cleavages and removing unlikely missed cleavage products (14Yen C.-Y. Russell S. Mendoza A.M. Meyer-Arendt K. Sun S. Cios K.J. Ahn N.G. Resing K.A. Improving sensitivity in shotgun proteomics using a peptide-centric database with reduced complexity: protease cleavage and SCX elution rules from data mining of MS/MS spectra.Anal. Chem. 2006; 78: 1071-1084Crossref PubMed Scopus (66) Google Scholar). Charge forms up to MH33+ were included for each sequence provided that a sufficient number of basic residues were present. For this in silico generated peptide database, there are a total of 2,918,714 peptide sequences that were at least 9 amino acids long. Of these, 1,452,058 passed the missed cleavage rules of Yen et al. (14Yen C.-Y. Russell S. Mendoza A.M. Meyer-Arendt K. Sun S. Cios K.J. Ahn N.G. Resing K.A. Improving sensitivity in shotgun proteomics using a peptide-centric database with reduced complexity: protease cleavage and SCX elution rules from data mining of MS/MS spectra.Anal. Chem. 2006; 78: 1071-1084Crossref PubMed Scopus (66) Google Scholar). Simulation of spectra utilized the program MassAnalyzer (v.2.1), which generates the simulated spectra as DTA files. To convert the simulated spectra into a spectral library, DTA files for each simulated spectra were converted to the appropriate text file format (extensible markup language (XML) for X!Hunter and spectrum-sequence list (SSL) + MS2 for BiblioSpec) and then converted into the required binary format for each program. The number of stored ions had to be specified for the X!Hunter library format and was set to the default of 20 ions. Table I shows versions and sizes of the public domain libraries and the SS library used in this study.Table ISpectral libraries of human peptide sequences used in this studySS libraryX!Hunter library (xRef)BiblioSpec libraryVersionIPI v.3.299/28/2007v.23.2, 11/5/2007Number of spectra3,306,625aThe in silico digested peptide sequences are limited by mass within 900–4500 Da, number of missed cleavages up to 2, charge state up to 3, allowing fixed carbamidomethyl-Cys, and passing missed cleavage rules (14).320,658bxRef is managed by protein entry, so the number of spectra may include duplications.297,519Number of unique sequences1,452,058122,314292,337Number of ions per spectrumNo limitUp to 20UnknowncThe number of ions per spectrum is spectrum-dependent and unknown for entries in the BiblioSpec library.a The in silico digested peptide sequences are limited by mass within 900–4500 Da, number of missed cleavages up to 2, charge state up to 3, allowing fixed carbamidomethyl-Cys, and passing missed cleavage rules (14Yen C.-Y. Russell S. Mendoza A.M. Meyer-Arendt K. Sun S. Cios K.J. Ahn N.G. Resing K.A. Improving sensitivity in shotgun proteomics using a peptide-centric database with reduced complexity: protease cleavage and SCX elution rules from data mining of MS/MS spectra.Anal. Chem. 2006; 78: 1071-1084Crossref PubMed Scopus (66) Google Scholar).b xRef is managed by protein entry, so the number of spectra may include duplications.c The number of ions per spectrum is spectrum-dependent and unknown for entries in the BiblioSpec library. Open table in a new tab Input experimental MS/MS spectra were extracted by extract_msn.exe (distributed with Bioworks 3.2) using the parameters -M1.4 -B85 -T4500 -S5 -G1 -I35 -C0 for LCQ data and the parameters -M0.2 -B85 -T4500 -S0 -G1 -I35 -C0 -P2 for LTQ data. Files were then formatted as MGF files for both X!Hunter and Mascot, and a converter was developed to change MGF-formatted experimental data to the MS2 format (15McDonald W.H. Tabb D.L. Sadygov R.G. MacCoss M.J. Venable J. Graumann J. Johnson J.R. Cociorva D. Yates III, J.R. MS1, MS2, and SQT-three unified, compact, and easily parsed file formats for the storage of shotgun proteomic spectra and identifications.Rapid Commun. Mass Spectrom. 2004; 18: 2162-2168Crossref PubMed Scopus (288) Google Scholar) used by BiblioSpec. X!Hunter XML and BiblioSpec SQT (15McDonald W.H. Tabb D.L. Sadygov R.G. MacCoss M.J. Venable J. Graumann J. Johnson J.R. Cociorva D. Yates III, J.R. MS1, MS2, and SQT-three unified, compact, and easily parsed file formats for the storage of shotgun proteomic spectra and identifications.Rapid Commun. Mass Spectrom. 2004; 18: 2162-2168Crossref PubMed Scopus (288) Google Scholar) output files were converted into an in-house MSPlus format (10Resing K.A. Meyer-Arendt K. Mendoza A.M. Aveline-Wolf L.D. Jonscher K.R. Pierce K.G. Old W.M. Cheung H.T. Russel S. Wattawa J.L. Goehle G.R. Knight R.D. Ahn N.G. Improving reproducibility and sensitivity in identifying human proteins by shotgun proteomics.Anal. Chem. 2004; 76: 3556-3568Crossref PubMed Scopus (203) Google Scholar) consisting of a comma-separated value (.csv) file with designated columns used in our regular work flow. DTAs for charge forms ≥4+ were excluded (less than 12% of spectra in data sets used in this study). Spectral library search applications require libraries that are preloaded into system memory. Because it is difficult to accommodate large libraries using 32-bit operating systems, searches against the simulated spectral library were performed by partitioning the input data set and library as described previously for peptidecentric database searching (14Yen C.-Y. Russell S. Mendoza A.M. Meyer-Arendt K. Sun S. Cios K.J. Ahn N.G. Resing K.A. Improving sensitivity in shotgun proteomics using a peptide-centric database with reduced complexity: protease cleavage and SCX elution rules from data mining of MS/MS spectra.Anal. Chem. 2006; 78: 1071-1084Crossref PubMed Scopus (66) Google Scholar). MGF files of input experimental data and simulated spectral libraries were partitioned by charge and by MH+ with masses overlapping between adjacent partitions to accommodate the parent mass tolerance. For searching with data sets containing multiple files, we developed an automated graphical user interface tool for X!Hunter that executes the searches and generates a single output file. This tool divides input MGF files by parent m/z and charge criteria, makes X!Hunter parameter files for the divided MGF files, invokes X!Hunter searches pairing the divided MGF files to the corresponding partition of the spectral library, and generates an MSPlus-formatted output by extracting information from the X!Hunter output files. The simulated library is available for download from the X!Hunter Web site along with the graphical user interface tool and documentation. Other results files can be obtained by request from the corresponding author. The experiments to evaluate the use of the MassAnalyzer-generated simulated spectra for spectrum-to-spectrum matching were carried out using two spectral library search programs, X!Hunter and BiblioSpec. These programs, the library-generating utilities, and the X!Hunter reference spectral library of observed MS/MS (referred to as “xRef”

Referência(s)