Artigo Acesso aberto Revisado por pares

The Generating Function of CID, ETD, and CID/ETD Pairs of Tandem Mass Spectra: Applications to Database Search

2010; Elsevier BV; Volume: 9; Issue: 12 Linguagem: Inglês

10.1074/mcp.m110.003731

ISSN

1535-9484

Autores

Sangtae Kim, Nikolai Mischerikow, Nuno Bandeira, J. Daniel Navarro, Louis Wich, Shabaz Mohammed, Albert J. R. Heck, Pavel A. Pevzner,

Tópico(s)

Analytical Chemistry and Chromatography

Resumo

Recent emergence of new mass spectrometry techniques (e.g. electron transfer dissociation, ETD) and improved availability of additional proteases (e.g. Lys-N) for protein digestion in high-throughput experiments raised the challenge of designing new algorithms for interpreting the resulting new types of tandem mass (MS/MS) spectra. Traditional MS/MS database search algorithms such as SEQUEST and Mascot were originally designed for collision induced dissociation (CID) of tryptic peptides and are largely based on expert knowledge about fragmentation of tryptic peptides (rather than machine learning techniques) to design CID-specific scoring functions. As a result, the performance of these algorithms is suboptimal for new mass spectrometry technologies or nontryptic peptides. We recently proposed the generating function approach (MS-GF) for CID spectra of tryptic peptides. In this study, we extend MS-GF to automatically derive scoring parameters from a set of annotated MS/MS spectra of any type (e.g. CID, ETD, etc.), and present a new database search tool MS-GFDB based on MS-GF. We show that MS-GFDB outperforms Mascot for ETD spectra or peptides digested with Lys-N. For example, in the case of ETD spectra, the number of tryptic and Lys-N peptides identified by MS-GFDB increased by a factor of 2.7 and 2.6 as compared with Mascot. Moreover, even following a decade of Mascot developments for analyzing CID spectra of tryptic peptides, MS-GFDB (that is not particularly tailored for CID spectra or tryptic peptides) resulted in 28% increase over Mascot in the number of peptide identifications. Finally, we propose a statistical framework for analyzing multiple spectra from the same precursor (e.g. CID/ETD spectral pairs) and assigning p values to peptide-spectrum-spectrum matches. Recent emergence of new mass spectrometry techniques (e.g. electron transfer dissociation, ETD) and improved availability of additional proteases (e.g. Lys-N) for protein digestion in high-throughput experiments raised the challenge of designing new algorithms for interpreting the resulting new types of tandem mass (MS/MS) spectra. Traditional MS/MS database search algorithms such as SEQUEST and Mascot were originally designed for collision induced dissociation (CID) of tryptic peptides and are largely based on expert knowledge about fragmentation of tryptic peptides (rather than machine learning techniques) to design CID-specific scoring functions. As a result, the performance of these algorithms is suboptimal for new mass spectrometry technologies or nontryptic peptides. We recently proposed the generating function approach (MS-GF) for CID spectra of tryptic peptides. In this study, we extend MS-GF to automatically derive scoring parameters from a set of annotated MS/MS spectra of any type (e.g. CID, ETD, etc.), and present a new database search tool MS-GFDB based on MS-GF. We show that MS-GFDB outperforms Mascot for ETD spectra or peptides digested with Lys-N. For example, in the case of ETD spectra, the number of tryptic and Lys-N peptides identified by MS-GFDB increased by a factor of 2.7 and 2.6 as compared with Mascot. Moreover, even following a decade of Mascot developments for analyzing CID spectra of tryptic peptides, MS-GFDB (that is not particularly tailored for CID spectra or tryptic peptides) resulted in 28% increase over Mascot in the number of peptide identifications. Finally, we propose a statistical framework for analyzing multiple spectra from the same precursor (e.g. CID/ETD spectral pairs) and assigning p values to peptide-spectrum-spectrum matches. Since the introduction of electron capture dissociation (ECD) 1The abbreviations used are:ECDelectron capture dissociationETDelectron transfer dissociationMS/MStandem mass spectrometryCIDcollision induced dissociationFDRfalse discovery ratePSMpeptide-spectrum matchPS2Mpeptide-spectrum-spectrum matchSCXstrong cation exchangePRMprefix-residue massPTMpost-translational modificationHPLChigh pressure liquid chromatography. in 1998 (1.Zubarev R. Kelleher N. McLafferty F. Electron capture dissociation of multiply charged protein cations. a nonergodic process.J. Am. Chem. Soc. 1998; 120: 3265-3266Crossref Scopus (1647) Google Scholar), electron-based peptide dissociation technologies have played an important role in analyzing intact proteins and post-translational modifications (2.Cooper H.J. Håkansson K. Marshall A.G. The role of electron capture dissociation in biomolecular analysis.Mass Spectrom. Rev. 2005; 24: 201-222Crossref PubMed Scopus (440) Google Scholar). However, until recently, this research-grade technology was available only to a small number of laboratories because it was commercially unavailable, required experience for operation, and could be implemented only with expensive FT-ICR instruments. The discovery of electron-transfer dissociation (ETD) (3.Syka J.E. Coon J.J. Schroeder M.J. Shabanowitz J. Hunt D.F. Peptide and protein sequence analysis by electron transfer dissociation mass spectrometry.Proc. Natl. Acad. Sci. U.S.A. 2004; 101: 9528-9533Crossref PubMed Scopus (1985) Google Scholar) enabled an ECD-like technology to be implemented in (relatively cheap) ion-trap instruments. Nowadays, many researchers are employing the ETD technology for tandem mass spectra generation (4.Taverna S.D. Ueberheide B.M. Liu Y. Tackett A.J. Diaz R.L. Shabanowitz J. Chait B.T. Hunt D.F. Allis C.D. Long-distance combinatorial linkage between methylation and acetylation on histone h3 n termini.Proc. Natl. Acad. Sci. U.S.A. 2007; 104: 2086-2091Crossref PubMed Scopus (140) Google Scholar, 5.Khidekel N. Ficarro S.B. Clark P.M. Bryan M.C. Swaney D.L. Rexach J.E. Sun Y.E. Coon J.J. Peters E.C. Hsieh-Wilson L.C. Probing the dynamics of o-glcnac glycosylation in the brain using quantitative proteomics.Nat. Chem. Biol. 2007; 3: 339-348Crossref PubMed Scopus (255) Google Scholar, 6.Appella E. Anderson C.W. New prospects for proteomics–electron-capture (ecd) and electron-transfer dissociation (etd) fragmentation techniques and combined fractional diagonal chromatography (cofradic).FEBS J. 2007; 274: 6255Crossref PubMed Scopus (7) Google Scholar, 7.Molina H. Horn D.M. Tang N. Mathivanan S. Pandey A. Global proteomic profiling of phosphopeptides using electron transfer dissociation tandem mass spectrometry.Proc. Natl. Acad. Sci. U.S.A. 2007; 104: 2199-2204Crossref PubMed Scopus (464) Google Scholar, 8.Altelaar A.F. Mohammed S. Brans M.A. Adan R.A. Heck A.J. Improved identification of endogenous peptides from murine nervous tissue by multiplexed peptide extraction methods and multiplexed mass spectrometric analysis.J. Proteome Res. 2009; 8: 870-876Crossref PubMed Scopus (23) Google Scholar, 9.Mohammed S. Lorenzen K. Kerkhoven R. van Breukelen B. Vannini A. Cramer P. Heck A.J. Multiplexed proteomics mapping of yeast rna polymerase ii and iii allows near-complete sequence coverage and reveals several novel phosphorylation sites.Anal. Chem. 2008; 80: 3584-3592Crossref PubMed Scopus (33) Google Scholar).Although the hardware technologies to generate ETD spectra are maturing rapidly, software technologies to analyze ETD spectra are still in infancy. There are two major approaches to analyzing tandem mass spectra: de novo sequencing and database search. Both approaches find the best-scoring peptide either among all possible peptides (de novo sequencing) or among all peptides in a protein database (database search). Although de novo sequencing is emerging as an alternative to database search, database search remains a more accurate (and thus preferred) method of spectral interpretation, so here we focus on the database search approach.Numerous database search engines are currently available, including SEQUEST (10.Eng J.K. McCormack A.L. Yates J.R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database.J. Am. Soc. Mass Spectrom. 1994; 5: 976-989Crossref PubMed Scopus (5363) Google Scholar), Mascot (11.Perkins D.N. Pappin D.J. Creasy D.M. Cottrell J.S. Probability-based protein identification by searching sequence databases using mass spectrometry data.Electrophoresis. 1999; 20: 3551-3567Crossref PubMed Scopus (6709) Google Scholar), OMSSA (12.Geer L.Y. Markey S.P. Kowalak J.A. Wagner L. Xu M. Maynard D.M. Yang X. Shi W. Bryant S.H. Open mass spectrometry search algorithm.J. Proteome Res. 2004; 3: 958-964Crossref PubMed Scopus (1157) Google Scholar), X!Tandem (13.Craig R. Beavis R.C. Tandem: matching proteins with tandem mass spectra.Bioinformatics. 2004; 20: 1466-1467Crossref PubMed Scopus (1965) Google Scholar), and InsPecT (14.Tanner S. Shu H. Frank A. Wang L.C. Zandi E. Mumby M. Pevzner P.A. Bafna V. Inspect: identification of posttranslationally modified peptides from tandem mass spectra.Anal. Chem. 2005; 77: 4626-4639Crossref PubMed Scopus (497) Google Scholar). However, most of them are inadequate for the analysis of ETD spectra because they are optimized for collision induced dissociation (CID) spectra that show different fragmentation propensities than those of ETD spectra. Additionally, the existing tandem mass spectrometry (MS/MS) tools are biased toward the analysis of tryptic peptides because trypsin is usually used for CID, and thus not suitable for the analysis of nontryptic peptides that are common for ETD. Therefore, even though some database search engines support the analysis of ETD spectra (e.g. SEQUEST, Mascot, and OMSSA), their performance remains suboptimal when it comes to analyzing ETD spectra. Recently, an ETD-specific database search tool (Z-Core) was developed; however it does not significantly improve over OMSSA (15.Sadygov R.G. Good D.M. Swaney D.L. Coon J.J. A new probabilistic database search algorithm for etd spectra.J. Proteome Res. 2009; 8: 3198-3205Crossref PubMed Scopus (32) Google Scholar).We present a new database search tool (MS-GFDB) that significantly outperforms existing database search engines in the analysis of ETD spectra, and performs equally well on nontryptic peptides. MS-GFDB employs the generating function approach (MS-GF) that computes rigorous p values of peptide-spectrum matches (PSMs) based on the spectrum-specific score histogram of all peptides (16.Kim S. Gupta N. Pevzner P.A. Spectral probabilities and generating functions of tandem mass spectra: A strike against decoy databases.J. Proteome Res. 2008; 7: 3354-3363Crossref PubMed Scopus (322) Google Scholar). 2The term "p-value" here and the term "spectral probability" used in Kim et al., 2008 (16.Kim S. Gupta N. Pevzner P.A. Spectral probabilities and generating functions of tandem mass spectra: A strike against decoy databases.J. Proteome Res. 2008; 7: 3354-3363Crossref PubMed Scopus (322) Google Scholar) are synonymous. Throughout the paper, we use "p-value," because it is more generally used. MS-GF p values are dependent only on the PSM (and not on the database), thus can be used as an alternative scoring function for the database search.Computing p values requires a scoring model evaluating qualities of PSMs. MS-GF adopts a probabilistic scoring model (MS-Dictionary scoring model) described in Kim et al., 2009 (17.Kim S. Gupta N. Bandeira N. Pevzner P.A. Spectral dictionaries: Integrating de novo peptide sequencing with database search of tandem mass spectra.Mol. Cell. Proteomics. 2009; 8: 53-69Abstract Full Text Full Text PDF PubMed Scopus (80) Google Scholar), considering multiple features including product ion types, peak intensities and mass errors. To define the parameters of this scoring model, MS-GF only needs a set of training PSMs. 3A thousand PSMs of unique peptides is usually sufficient. This set of PSMs can be obtained in a variety of ways: for example, one can generate CID/ETD pairs and use peptides identified by CID to form PSMs for ETD. Alternatively, one can generate spectra from a purified protein (when PSMs can be inferred from the accurate parent mass alone) or use a previously developed (not necessary optimal) tool to generate training PSMs. From these training PSMs, MS-GF automatically derives scoring parameters without assuming any prior knowledge about the specifics of a particular peptide fragmentation method (e.g. ETD, CID, etc.) and/or proteolytic origin of the peptides. MS-GF was originally designed for the analysis of CID spectra, but now it has been extended to other types of spectra generated by various fragmentation techniques and/or various enzymes. We show that MS-GF can be successfully applied to novel types of spectra (e.g. ETD of Lys-N peptides (18.Taouatas N. Drugan M.M. Heck A.J. Mohammed S. Straightforward ladder sequencing of peptides using a lys-n metalloendopeptidase.Nat. Methods. 2008; 5: 405-407Crossref PubMed Scopus (103) Google Scholar, 19.Eppstein D. Targeted scx based peptide fractionation for optimal sequencing by collision induced, and electron transfer dissociation.J. Proteomics Bioinform. 2008; 1: 379-388Crossref Google Scholar)) by simply retraining scoring parameters without any modification. Note that although the same scoring model is used for different types of spectra, the parameters derived to score different types of spectra are dissimilar.We compared the performance of MS-GFDB with Mascot on a large ETD data set and found that it generated many more peptide identifications for the same false discovery rates (FDR). For example, at 1% peptide level FDR, MS-GFDB identified 9450 unique peptides from 81,864 ETD spectra of Lys-N peptides whereas Mascot only identified 3672 unique peptides, ≈160% increase in the number of peptide identifications (a similar improvement is observed for ETD spectra of tryptic peptides). 4The peptide level FDR is defined as the number of unique peptides in the decoy database over the number of unique peptides in the target database at a certain threshold. At 1% spectrum level FDR, MS-GFDB identified 22,003 spectra, whereas Mascot identified 9027 spectra, a 140% increase in the number of identified spectra for ETD spectra of Lys-N peptides. MS-GFDB also showed a significant 28% improvement in the number of identified peptides from CID spectra of tryptic peptides (16,203 peptides as compared with 12,658 peptides identified by Mascot).The ETD technology complements rather than replaces CID because both technologies have some advantages: CID for smaller peptides with small charges, ETD for larger and multiply charged peptides (20.Zubarev R.A. Zubarev A.R. Savitski M.M. Electron capture/transfer versus collisionally activated/induced dissociations: solo or duet?.J. Am. Soc. Mass Spectrom. 2008; 19: 753-761Crossref PubMed Scopus (128) Google Scholar, 21.Swaney D.L. McAlister G.C. Coon J.J. Decision tree-driven tandem mass spectrometry for shotgun proteomics.Nat. Methods. 2008; 5: 959-964Crossref PubMed Scopus (267) Google Scholar). An alternative way to utilize ETD is to use it in conjunction with CID because CID and ETD generate complementary sequence information (20.Zubarev R.A. Zubarev A.R. Savitski M.M. Electron capture/transfer versus collisionally activated/induced dissociations: solo or duet?.J. Am. Soc. Mass Spectrom. 2008; 19: 753-761Crossref PubMed Scopus (128) Google Scholar, 22.Nielsen M.L. Savitski M.M. Zubarev R.A. Improving protein identification using complementary fragmentation techniques in fourier transform mass spectrometry.Mol. Cell. Proteomics. 2005; 4: 835-845Abstract Full Text Full Text PDF PubMed Scopus (130) Google Scholar, 23.Savitski M.M. Nielsen M.L. Kjeldsen F. Zubarev R.A. Proteomics-grade de novo sequencing approach.J. Proteome Res. 2005; 4: 2348-2354Crossref PubMed Scopus (135) Google Scholar). ETD-enabled instruments often support generating both CID and ETD spectra (CID/ETD pairs) for the same peptide. Although the CID/ETD pairs promise a great improvement in peptide identification, the full potential of such pairs has not been fully realized yet. In the case of de novo sequencing, de novo sequencing tools utilizing CID/ETD pairs indeed result in more accurate de novo peptide sequencing than traditional CID-based algorithms (23.Savitski M.M. Nielsen M.L. Kjeldsen F. Zubarev R.A. Proteomics-grade de novo sequencing approach.J. Proteome Res. 2005; 4: 2348-2354Crossref PubMed Scopus (135) Google Scholar, 24.Datta R. Bern M. Spectrum fusion: Using multiple mass spectra for de novo peptide sequencing.J. Comput. Biol. 2009; 16: 1169-1182Crossref PubMed Scopus (30) Google Scholar, 25.Bertsch A. Leinenbach A. Pervukhin A. Lubeck M. Hartmer R. Baessmann C. Elnakady Y.A. Müller R. Böcker S. Huber C.G. Kohlbacher O. De novo peptide sequencing by tandem ms using complementary cid and electron transfer dissociation.Electrophoresis. 2009; 30: 3736-3747Crossref PubMed Scopus (45) Google Scholar). However, in the case of database search, the argument that the use of CID/ETD pairs improves peptide identifications remains poorly substantiated. A few tools are developed to use CID/ETD (or CID/ECD) pairs for the database search but they are limited to preprocessing/postprocessing of the spectral data before or following running a traditional database search tool (26.Molina H. Matthiesen R. Kandasamy K. Pandey A. Comprehensive comparison of collision induced dissociation and electron transfer dissociation.Anal. Chem. 2008; 80: 4825-4835Crossref PubMed Scopus (86) Google Scholar, 27.Good D.M. Wenger C.D. McAlister G.C. Bai D.L. Hunt D.F. Coon J.J. Post-acquisition etd spectral processing for increased peptide identifications.J. Am. Soc. Mass Spectrom. 2009; 20: 1435-1440Crossref PubMed Scopus (60) Google Scholar). Nielsen et al., 2005 (22.Nielsen M.L. Savitski M.M. Zubarev R.A. Improving protein identification using complementary fragmentation techniques in fourier transform mass spectrometry.Mol. Cell. Proteomics. 2005; 4: 835-845Abstract Full Text Full Text PDF PubMed Scopus (130) Google Scholar) pioneered the combined use of CID and ECD for the database search. Given a CID/ECD pair, they generated a combined spectrum comprised only of complementary pairs of peaks, and searched it with Mascot. 5The combined spectrum is a pseudo-spectrum generated from the set of pairs of peaks supporting the same backbone cleavage. The pair may come from the same spectrum (e.g. two peaks with the sum of masses equals to the parent mass) or different spectra (e.g. a peak from CID spectrum and a peak from ECD spectrum with the mass difference 16.02 Da, representing a possible pair of y and z fragment ions). However, this approach is hard to generalize to less accurate CID/ETD pairs generated by ion-trap instruments because there is a higher chance that the identified complementary pairs of peaks are spurious. More importantly, using traditional MS/MS tools (such as Mascot) for the database search of the combined spectrum is inappropriate, because they are not optimized for analyzing such combined spectra; a better approach would be to develop a new database search tool tailored for the combined spectrum. Recently, Molina et al., 2008 (26.Molina H. Matthiesen R. Kandasamy K. Pandey A. Comprehensive comparison of collision induced dissociation and electron transfer dissociation.Anal. Chem. 2008; 80: 4825-4835Crossref PubMed Scopus (86) Google Scholar) studied database search of CID/ETD pairs using Spectrum Mill (Agilent Technologies, Santa Clara, CA) and came to a counterintuitive conclusion that using only CID spectra identifies 12% more unique peptides than using CID/ETD pairs. We believe that it is an acknowledgment of limitations of the traditional MS/MS database search tools for the analysis of multiple spectra generated from a single peptide.In this paper, we modify the generating function approach for interpreting CID/ETD pairs and further apply it to improve the database search with CID/ETD pairs. In contrast to previous approaches, our scoring is specially designed to interpret CID/ETD pairs and can be generalized to analyzing any type of multiple spectra generated from a single peptide. When CID/ETD pairs from trypsin digests are used, MS-GFDB identified 13% and 27% more peptides compared with the case when only CID spectra and only ETD spectra are used, respectively. The difference was even more prominent when CID/ETD pairs from Lys-N digests were used, with 41% and 33% improvement over CID only and ETD only, respectively.Assigning a p value to a PSM greatly helped researchers to evaluate the quality of peptide identifications. We now turn to the problem of assigning a p value to a peptide-spectrum-spectrum match (PS2M) when two spectra in PS2M are generated by different fragmentation technologies (e.g. ETD and CID). We argue that assigning statistical significance to a PS2M (or even PSnM) is a prerequisite for rigorous CID/ETD analyses. To our knowledge, MS-GFDB is the first tool to generate statistically rigorous p values of PSnMs.The MS-GFDB executable and source code is available at the website of Center for Computational Mass Spectrometry at UCSD (http://proteomics.ucsd.edu). It takes a set of spectra (CID, ETD, or CID/ETD pairs) and a protein database as an input and outputs peptide matches. If the input is a set of CID/ETD pairs, it outputs the best scoring peptide matches and their p values (1) using only CID spectra, (2) using only ETD spectra, and (3) using combined spectra of CID/ETD pairs.EXPERIMENTAL PROCEDURESDigestion of Cell LysateHEK293 cells were grown to confluence, harvested and resuspended in lysis buffer (50 ammonium bicarbonate, 8 m urea, Complete EDTA-free protease inhibitor mix (Roche Applied Science), 5 mm potassium phosphate, 1 mm potassium fluoride, and 1 mm sodium orthovanadate) and incubated for 20 min at 4 °C. An insoluble fraction was spun down at 1000 × g for 10 min at 4 °C and the protein content of the supernatant was determined using the 2DQuant Kit (GE Healthcare). Per 1 mg of lysate 45 mm dithiothreitol were used for reduction (30 min at 50 °C) and 100 mm iodoacetamide for subsequent alkylation (30 min at RT). Trypsin digests were generated by digestion of 1 mg cell lysate with 1.25 μg Lys-C for 4 h at RT followed by dilution to 2 m urea and digestion with 15 μg trypsin for 16 h at 37 °C. Lys-N digests were made by digestion of 1 mg cell lysate with 5 μg Lys-N for 4 h at RT, dilution to 2 m urea, and another digestion with 5 μg Lys-N for 16 h at 37 °C.Peptide Prefractionation by Strong Cation Exchange (SCX)Fractionation of peptides was performed as described earlier (28.Taouatas N. Altelaar A.F. Drugan M.M. Helbig A.O. Mohammed S. Heck A.J. Strong cation exchange-based fractionation of lys-n-generated peptides facilitates the targeted analysis of post-translational modifications.Mol. Cell. Proteomics. 2009; 8: 190-200Abstract Full Text Full Text PDF PubMed Scopus (68) Google Scholar, 29.Gauci S. Helbig A.O. Slijper M. Krijgsveld J. Heck A.J. Mohammed S. Lys-n and trypsin cover complementary parts of the phosphoproteome in a refined scx-based approach.Anal. Chem. 2009; 81: 4493-4501Crossref PubMed Scopus (226) Google Scholar). In detail, digests were acidified with formic acid and loaded onto two C18 cartridges using an Agilent 1100 high pressure liquid chromatography (HPLC) system operated at 100 μl/min with 0.05% formic acid in water. Peptides were then eluted from the C18 cartridges using 80% acetonitrile and 0.05% formic acid in water onto a PolySULFOETHYL A column (200 mm × 2.1 mm column, PolyLC). Separation of different peptide populations was performed at 200 μl/min using a nonlinear gradient as follows: 0 to 10 min 100% solvent A (5 mm KH2PO4, 30% acetonitrile, 0.05% formic acid), 10 to 15 min from 0% to 26% solvent B (350 mm KCl, 5 mm KH2PO4, 30% acetonitrile, 0.05% formic acid), 15 to 40 min from 26% to 35% solvent B and from 40 to 45 min from 35% to 60% solvent B, and from 45 to 49 min from 60% to 100% solvent B. Fractions were collected in 1 min intervals for 40 min, dried down in a vacuum centrifuge, and resuspended in 10% formic acid.Mass SpectrometrySCX fractions were analyzed on a reversed-phase nano-LC-coupled LTQ Orbitrap XL ETD (Thermo Fisher Scientific). An Agilent 1200 series HPLC system was equipped with a 20 mm Aqua C18 (Phenomenex) trapping column (packed in-house, 100 μm inner diameter, 5 μm particle size) and a 400 mm ReproSil-Pur C18-AQ (Dr. Maisch GmbH) analytical column (packed in-house, 50 μm inner diameter, 3 μm particle size). Trapping was performed at 5 μl/min solvent C (0.1 m acetic acid in water) for 10 min, and elution was achieved with a gradient from 10% to 30% (v/v) solvent D (0.1 m acetic acid in 1:4 acetonitrile : water) in solvent C in 110 min, followed by a gradient of 30% to 50% (v/v) solvent D in solvent C in 30 min, followed by a gradient of 50% to 100% (v/v) solvent D in solvent C in 5 min and finally 100% solvent D for 2 min. The flow rate was passively split from 0.45 ml/min to 100 nL/min. Nano-electrospray was achieved using a distally coated fused silica emitter (360 μm outer diameter, 20 μm inner diameter, 10 μm tip inner diameter, New Objective) biased to 1.7 kV. The instrument was operated in data dependent mode to automatically switch between MS and MS/MS. Survey full scan MS spectra were acquired from m/z 350 to m/z 1500 in the Orbitrap with a resolution of 60,000 at m/z 400 following accumulation to a target value of 500,000 in the linear ion trap. The two most intense ions at a threshold of above 500 were fragmented in the linear ion trap using CID at an AGC target value of 30,000 and ETD with supplemental activation at an AGC target value of 50,000. The ETD reagent AGC target value was set to 100,000 and the reaction time to 50 ms.Data ProcessingFrom every raw data file recorded by the mass spectrometer, representing a single SCX fraction, two different peak lists containing either CID or ETD fragmentation data were generated using Proteome Discoverer (version 1.0, Thermo Fisher Scientific) with a signal-to-noise threshold of three and the following settings for the ETD-nonfragment filter: precursor peak removal with 4 Da, charge-reduced precursor removal with 8 Da, and removal of known neutral losses from charge-reduced precursors with 8 Da within a window of 120 Da. Single-fraction peak lists of the major peptide-containing SCX fractions for trypsin-derived and Lys-N-derived peptides were then merged into four larger peak lists, denoted CID-Tryp, ETD-Tryp, CID-LysN, and ETD-LysN. The whole data set is composed of 168,960 CID/ETD pairs. Of this, 87,096 pairs (51,233 with charge 2+, 24,854 with charge 3+, and 11,009 with charges 4+ and larger) are from the trypsin digests and 81,864 (24,284 with charge 2+, 28,168 with charge 3+ and 29,412 with charges 4+ and larger) are from the Lys-N digests. Spectra with precursor charges from 2+ to 7+ were considered in the further analyses. All the spectra (Raw files and mzXML files) and database search results associated with this manuscript may be downloaded from the Tranche repository (http://proteomecommons.org/tranche/) using the following hash:mQTEDmtWauUPq41hJMPY/tnB3+zXhc5GSMKuRm+ljChFjtJrrrnJ4WwNpkgWM0/zGE0Zy/STG0NWJwTbbqMnInXrKi8AAAAAAAB5sA==Mascot AnalysisMascot (version 2.3.0, Matrix Science) was used to search the peaklists against an in-house built database (74,190 entries; 31,263,418 amino acids) assembled from the IPI human database (version 3.52, http://www.ebi.ac.uk/ipi) plus common contaminants (target database). A decoy database was constructed by reversing all sequences and slightly scrambling entries using MaxQuant (version 1.0.13.8; http://www.maxquant.org) (30.Cox J. Mann M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification.Nat. Biotechnol. 2008; 26: 1367-1372Crossref PubMed Scopus (8830) Google Scholar). The target and decoy databases were searched separately to estimate FDRs. The following parameters were used for database searching: 50 ppm precursor mass tolerance, 0.5 Da fragment ion tolerance, up to two missed cleavages allowed, carbamidomethyl cysteine as fixed modification, no variable modifications. The enzyme was specified as either trypsin or Lys-N and the instrument type either ESI-TRAP or ETD-TRAP.Training MS-GF Scoring ParametersMS-GF takes a set of PSMs as an input training set and outputs a scoring parameter file containing the parameters used for scoring (see Supplement 1 for details on training scoring parameters). We first generated initial scoring parameter files for the four data sets (CID-Tryp, ETD-Tryp, CID-LysN, and ETD-LysN) using PSMs with Mascot scores corresponding to peptide level FDRs less than 1% as a training set. Using these initial parameter files, we ran MS-GFDB and selected PSMs with MS-GF p values corresponding to peptide level FDRs less than 1%. These PSMs were used as a new training set to build the final scoring parameter files.MS-GFDB Search (for CID or ETD spectra)Because MS-GFDB automatically preprocesses spectra (see Supplement 1 for details), we converted each raw data file into an mzXML file using ReAdW 4.3.1 (31.Keller A. Eng J. Zhang N. Li X.J. Aebersold R. A uniform proteomics ms/ms analysis platform utilizing open xml file formats.Mol Syst Biol. 2005; 1 (2005.0017)Crossref PubMed Scopus (594) Google Scholar) and used the mzXML file in the MS-GFDB search (as opposed to using Proteome Discoverer for noise and (charge-reduced) precursor filtering). MS-GFDB searches were carried out against the same database with the same parameters as were used for Mascot searches.MS-GFDB uses two scores: the MS-GF score and the p value (both are computed by MS-GF). The MS-GF score is used to evaluate the quality of a PSM and the p value is used to assess the statistical significance of a PSM. To compute the MS-GF score, MS-GF first converts every spectrum into a Prefix-Residue Mass (PRM) spectrum (14.Tanner S. Shu H. Frank A. Wang L.C. Zandi E. Mumby M. Pevzner P.A. Bafna V. Inspect: identification

Referência(s)