Deep Coverage of the Escherichia coli Proteome Enables the Assessment of False Discovery Rates in Simple Proteogenomic Experiments

Artigo Acesso aberto Revisado por pares

Deep Coverage of the Escherichia coli Proteome Enables the Assessment of False Discovery Rates in Simple Proteogenomic Experiments

2013; Elsevier BV; Volume: 12; Issue: 11 Linguagem: Inglês

10.1074/mcp.m113.029165

ISSN

1535-9484

Autores

Karsten Krug, Alejandro Carpy, Gesa Behrends, Katarina Matić, Nelson C. Soares, Boris Maček,

Tópico(s)

Bioinformatics and Genomic Networks

Resumo

Recent advances in mass spectrometry (MS) have led to increased applications of shotgun proteomics to the refinement of genome annotation. The typical "proteo-genomic" workflows rely on the mapping of peptide MS/MS spectra onto databases derived via six-frame translation of the genome sequence. These databases contain a large proportion of spurious protein sequences which make the statistical confidence of the resulting peptide spectrum matches difficult to assess. Here we performed a comprehensive analysis of the Escherichia coli proteome using LTQ-Orbitrap MS and mapped the corresponding MS/MS spectra onto a six-frame translation of the E. coli genome. We hypothesized that the protein-coding part of the E. coli genome approaches complete annotation and that the majority of six frame-specific (novel) peptide spectrum matches can be considered as false positive identifications. We confirm our hypothesis by showing that the posterior error probability distribution of novel hits is almost identical to that of reversed (decoy) hits; this enables us to estimate the sensitivity, specificity, accuracy, and false discovery rate in a typical bacterial proteo-genomic dataset. We use two complementary computational frameworks for processing and statistical assessment of MS/MS data: MaxQuant and Trans-Proteomic Pipeline. We show that MaxQuant achieves a more sensitive six-frame database search with an acceptable false discovery rate and is therefore well suited for global genome reannotation applications, whereas the Trans-Proteomic Pipeline achieves higher specificity and is well suited for high-confidence validation. The use of a small and well-annotated bacterial genome enables us to address genome coverage achieved in state-of-the-art bacterial proteomics: identified peptide sequences mapped to all expressed E. coli proteins but covered 31.7% of the protein-coding genome sequence. Our results show that false discovery rates can be substantially underestimated even in "simple" proteo-genomic experiments obtained by means of high-accuracy MS and point to the necessity of further improvements concerning the coverage of peptide sequences by MS-based methods. Recent advances in mass spectrometry (MS) have led to increased applications of shotgun proteomics to the refinement of genome annotation. The typical "proteo-genomic" workflows rely on the mapping of peptide MS/MS spectra onto databases derived via six-frame translation of the genome sequence. These databases contain a large proportion of spurious protein sequences which make the statistical confidence of the resulting peptide spectrum matches difficult to assess. Here we performed a comprehensive analysis of the Escherichia coli proteome using LTQ-Orbitrap MS and mapped the corresponding MS/MS spectra onto a six-frame translation of the E. coli genome. We hypothesized that the protein-coding part of the E. coli genome approaches complete annotation and that the majority of six frame-specific (novel) peptide spectrum matches can be considered as false positive identifications. We confirm our hypothesis by showing that the posterior error probability distribution of novel hits is almost identical to that of reversed (decoy) hits; this enables us to estimate the sensitivity, specificity, accuracy, and false discovery rate in a typical bacterial proteo-genomic dataset. We use two complementary computational frameworks for processing and statistical assessment of MS/MS data: MaxQuant and Trans-Proteomic Pipeline. We show that MaxQuant achieves a more sensitive six-frame database search with an acceptable false discovery rate and is therefore well suited for global genome reannotation applications, whereas the Trans-Proteomic Pipeline achieves higher specificity and is well suited for high-confidence validation. The use of a small and well-annotated bacterial genome enables us to address genome coverage achieved in state-of-the-art bacterial proteomics: identified peptide sequences mapped to all expressed E. coli proteins but covered 31.7% of the protein-coding genome sequence. Our results show that false discovery rates can be substantially underestimated even in "simple" proteo-genomic experiments obtained by means of high-accuracy MS and point to the necessity of further improvements concerning the coverage of peptide sequences by MS-based methods. MS-based proteomics has become an indispensable tool for studying in vivo protein expression on a global scale (1Aebersold R. Mann M. Mass spectrometry-based proteomics.Nature. 2003; 422: 198-207Crossref PubMed Scopus (5585) Google Scholar). Briefly, in a typical "shotgun" proteomic experiment, the whole proteome of an organism is extracted and digested by a protease (e.g. trypsin). The resulting complex peptide mixtures are usually further fractionated and separated via liquid chromatography (LC) before ionization and analysis in the mass spectrometer. Recent innovations in MS technology (2Michalski A. Damoc E. Hauschild J.P. Lange O. Wieghaus A. Makarov A. Nagaraj N. Cox J. Mann M. Horning S. Mass spectrometry-based proteomics using Q Exactive, a high-performance benchtop quadrupole Orbitrap mass spectrometer.Mol. Cell. Proteomics. 2011; 10 (M111.011015)Abstract Full Text Full Text PDF PubMed Scopus (626) Google Scholar, 3Michalski A. Damoc E. Lange O. Denisov E. Nolting D. Muller M. Viner R. Schwartz J. Remes P. Belford M. Dunyach J.J. Cox J. Horning S. Mann M. Makarov A. Ultra high resolution linear ion trap Orbitrap mass spectrometer (Orbitrap Elite) facilitates top down LC MS/MS and versatile peptide fragmentation modes.Mol. Cell. Proteomics. 2012; 11 (O111.013698)Abstract Full Text Full Text PDF Scopus (265) Google Scholar, 4Olsen J.V. Schwartz J.C. Griep-Raming J. Nielsen M.L. Damoc E. Denisov E. Lange O. Remes P. Taylor D. Splendore M. Wouters E.R. Senko M. Makarov A. Mann M. Horning S. A dual pressure linear ion trap Orbitrap instrument with very high sequencing speed.Mol. Cell. Proteomics. 2009; 8: 2759-2769Abstract Full Text Full Text PDF PubMed Scopus (376) Google Scholar) enable high peptide sequencing rates with high mass accuracy and sensitivity, placing the routine analysis of entire proteomes within reach (5de Godoy L.M. Olsen J.V. Cox J. Nielsen M.L. Hubner N.C. Frohlich F. Walther T.C. Mann M. Comprehensive mass-spectrometry-based proteome quantification of haploid versus diploid yeast.Nature. 2008; 455: 1251-1254Crossref PubMed Scopus (737) Google Scholar, 6Picotti P. Bodenmiller B. Mueller L.N. Domon B. Aebersold R. Full dynamic range proteome analysis of S. cerevisiae by targeted proteomics.Cell. 2009; 138: 795-806Abstract Full Text Full Text PDF PubMed Scopus (647) Google Scholar). Modern genome annotation uses computational ab initio approaches to predict coding regions and gene models from raw sequencing data (7Frishman D. Valencia A. Modern Genome Annotation: The BioSapiens Network. Springer, New York2009Google Scholar, 8Brent M.R. Genome annotation past, present, and future: how to define an ORF at each locus.Genome Res. 2005; 15: 1777-1786Crossref PubMed Scopus (95) Google Scholar). As the ultimate evidence of gene expression is the detection of its product, transcriptomic data are commonly used to train gene prediction algorithms (9Stanke M. Diekhans M. Baertsch R. Haussler D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding.Bioinformatics. 2008; 24: 637-644Crossref PubMed Scopus (911) Google Scholar). Similarly, MS-based proteomics is increasingly used in genome annotation. In a typical proteo-genomics experiment, MS/MS spectra of peptides are searched against databases derived via in silico six-frame translation of the whole genome sequence (10Kuster B. Mortensen P. Andersen J.S. Mann M. Mass spectrometry allows direct identification of proteins in large genomes.Proteomics. 2001; 1: 641-650Crossref PubMed Scopus (112) Google Scholar, 11Armengaud J. Proteo-genomics and systems biology: quest for the ultimate missing parts.Expert Rev. Proteomics. 2010; 7: 65-77Crossref PubMed Scopus (50) Google Scholar, 12Yates 3rd, J.R. Eng J.K. McCormack A.L. Mining genomes: correlating tandem mass spectra of modified and unmodified peptides to sequences in nucleotide databases.Anal. Chem. 1995; 67: 3202-3210Crossref PubMed Scopus (363) Google Scholar, 13Castellana N. Bafna V. Proteo-genomics to discover the full coding content of genomes: a computational perspective.J. Proteomics. 2010; 73: 2124-2135Crossref PubMed Scopus (132) Google Scholar, 14Tanner S. Shen Z.X. Ng J. Florea L. Guigo R. Briggs S.P. Bafna V. Improving gene annotation using peptide mass spectrometry.Genome Res. 2007; 17: 231-239Crossref PubMed Scopus (152) Google Scholar). This approach has been applied, alone or in combination with transcriptomic data, in order to refine genome annotation in several organisms, including C. elegans (15Merrihew G.E. Davis C. Ewing B. Williams G. Kall L. Frewen B.E. Noble W.S. Green P. Thomas J.H. MacCoss M.J. Use of shotgun proteomics for the identification, confirmation, and correction of C. elegans gene annotations.Genome Res. 2008; 18: 1660-1669Crossref PubMed Scopus (71) Google Scholar), P. pacificus (16Borchert N. Dieterich C. Krug K. Schutz W. Jung S. Nordheim A. Sommer R.J. Macek B. Proteo-genomics of Pristionchus pacificus reveals distinct proteome structure of nematode models.Genome Res. 2010; 20: 837-846Crossref PubMed Scopus (129) Google Scholar), S. cerevisiae (17Oshiro G. Wodicka L.M. Washburn M.P. Yates J.R. Lockhart D.J. Winzeler E.A. Parallel identification of new genes in Saccharomyces cerevisiae.Genome Res. 2002; 12: 1210-1220Crossref PubMed Scopus (37) Google Scholar), S. pombe (18Bitton D.A. Wood V. Scutt P.J. Grallert A. Yates T. Smith D.L. Hagan I.M. Miller C.J. Augmented annotation of the Schizosaccharomyces pombe genome reveals additional genes required for growth and viability.Genetics. 2011; 187: 1207-1217Crossref PubMed Scopus (25) Google Scholar), A. thaliana (19Castellana N.E. Payne S.H. Shen Z.X. Stanke M. Bafna V. Briggs S.P. Discovery and revision of Arabidopsis genes by proteo-genomics.Proc. Natl. Acad. Sci. U.S.A. 2008; 105: 21034-21038Crossref PubMed Scopus (232) Google Scholar), S. nodorum (20Bringans S. Hane J.K. Casey T. Tan K.C. Lipscombe R. Solomon P.S. Oliver R.P. Deep proteo-genomics; high throughput gene validation by multidimensional liquid chromatography and mass spectrometry of proteins from the fungal wheat pathogen Stagonospora nodorum.BMC Bioinformatics. 2009; 10: 301Crossref PubMed Scopus (29) Google Scholar), T. gondii (21Xia D. Sanderson S.J. Jones A.R. Prieto J.H. Yates J.R. Bromley E. Tomley F.M. Lal K. Sinden R.E. Brunk B.P. Roos D.S. Wastling J.M. The proteome of Toxoplasma gondii: integration with the genome provides novel insights into gene expression and annotation.Genome Biol. 2008; 9: R116Crossref PubMed Scopus (102) Google Scholar), A. gambiae (22Kalume D.E. Peri S. Reddy R. Zhong J. Okulate M. Kumar N. Pandey A. Genome annotation of Anopheles gambiae using mass spectrometry-derived data.BMC Genomics. 2005; 6: 128Crossref PubMed Scopus (56) Google Scholar), mouse (23Brosch M. Saunders G.I. Frankish A. Collins M.O. Yu L. Wright J. Verstraten R. Adams D.J. Harrow J. Choudhary J.S. Hubbard T. Shotgun proteomics aids discovery of novel protein-coding genes, alternative splicing, and "resurrected" pseudogenes in the mouse genome.Genome Res. 2011; 21: 756-767Crossref PubMed Scopus (99) Google Scholar), and human (24Bitton D.A. Smith D.L. Connolly Y. Scutt P.J. Miller C.J. An integrated mass-spectrometry pipeline identifies novel protein coding-regions in the human genome.PLoS One. 2010; 5: e8949Crossref PubMed Scopus (28) Google Scholar, 25Fermin D. Allen B.B. Blackwell T.W. Menon R. Adamski M. Xu Y. Ulintz P. Omenn G.S. States D.J. Novel gene and gene model detection using a whole genome open reading frame analysis in proteomics.Genome Biol. 2006; 7: R35Crossref PubMed Scopus (107) Google Scholar). Bacteria are especially well suited for MS-assisted genome annotation because of their relatively simple genome structures and small genome sizes, which lead to overall better sequence coverage in a typical proteomics experiment (26Armengaud J. Microbiology and proteomics, getting the best of both worlds!.Environ. Microbiol. 2012; 15: 12-23Crossref PubMed Scopus (67) Google Scholar, 27Armengaud J. A perfect genome annotation is within reach with the proteomics and genomics alliance.Curr. Opin. Microbiol. 2009; 12: 292-300Crossref PubMed Scopus (81) Google Scholar, 28Chen W.B. Laidig K.E. Park Y. Park K. Yates J.R. Lamont R.J. Hackett M. Searching the Porphyromonas gingivalis genome with peptide fragmentation mass spectra.Analyst. 2001; 126: 52-57Crossref PubMed Scopus (23) Google Scholar, 29Wang R. Prince J.T. Marcotte E.M. Mass spectrometry of the M. smegmatis proteome: protein expression levels correlate with function, operons, and codon bias.Genome Res. 2005; 15: 1118-1126Crossref PubMed Scopus (61) Google Scholar, 30de Souza G.A. Malen H. Softeland T. Saelensminde G. Prasad S. Jonassen I. Wiker H.G. High accuracy mass spectrometry analysis as a tool to verify and improve gene annotation using Mycobacterium tuberculosis as an example.BMC Genomics. 2008; 9: 316Crossref PubMed Scopus (61) Google Scholar, 31de Souza G.A. Softeland T. Koehler C.J. Thiede B. Wiker H.G. Validating divergent ORF annotation of the Mycobacterium leprae genome through a full translation data set and peptide identification by tandem mass spectrometry.Proteomics. 2009; 9: 3233-3243Crossref PubMed Scopus (29) Google Scholar, 32Kelkar D.S. Kumar D. Kumar P. Balakrishnan L. Muthusamy B. Yadav A.K. Shrivastava P. Marimuthu A. Anand S. Sundaram H. Kingsbury R. Harsha H.C. Nair B. Prasad T.S. Chauhan D.S. Katoch K. Katoch V.M. Chaerkady R. Ramachandran S. Dash D. Pandey A. Proteo-genomic analysis of Mycobacterium tuberculosis by high resolution mass spectrometry.Mol. Cell. Proteomics. 2011; 10 (M111.011627)Abstract Full Text Full Text PDF PubMed Google Scholar, 33Venter E. Smith R.D. Payne S.H. Proteo-genomic analysis of bacteria and archaea: a 46 organism case study.PLoS One. 2011; 6: e27587Crossref PubMed Scopus (54) Google Scholar). The use of six-frame databases in proteo-genomics experiments is challenging because of their large sizes, which increase the search space as well as affect the sensitivity of database searches (34Krug K. Nahnsen S. Macek B. Mass spectrometry at the interface of proteomics and genomics.Mol. Biosyst. 2011; 7: 284-291Crossref PubMed Google Scholar). Additionally, these databases contain a high proportion of artificial sequences resulting from frames that are not transcribed (13Castellana N. Bafna V. Proteo-genomics to discover the full coding content of genomes: a computational perspective.J. Proteomics. 2010; 73: 2124-2135Crossref PubMed Scopus (132) Google Scholar, 35Blakeley P. Overton I.M. Hubbard S.J. Addressing statistical biases in nucleotide-derived protein databases for proteo-genomic search strategies.J. Proteome Res. 2012; 11: 5221-5234Crossref PubMed Scopus (64) Google Scholar). These spurious protein sequences are difficult to discriminate from the true protein sequences, which makes the statistical confidence of the resulting peptide spectrum matches (PSMs) 1The abbreviations used are: ABCammonium bicarbonateACNacetonitrileFDRfalse discovery rateMMAmixture-model approachORFopen reading framePEPposterior error probabilityPSMpeptide spectrum matchTDAtarget-decoy approachTPPTrans-Proteomic Pipeline. 1The abbreviations used are: ABCammonium bicarbonateACNacetonitrileFDRfalse discovery rateMMAmixture-model approachORFopen reading framePEPposterior error probabilityPSMpeptide spectrum matchTDAtarget-decoy approachTPPTrans-Proteomic Pipeline. difficult to calculate. ammonium bicarbonate acetonitrile false discovery rate mixture-model approach open reading frame posterior error probability peptide spectrum match target-decoy approach Trans-Proteomic Pipeline. ammonium bicarbonate acetonitrile false discovery rate mixture-model approach open reading frame posterior error probability peptide spectrum match target-decoy approach Trans-Proteomic Pipeline. Here we take advantage of the small size (4.6 Mb), simple architecture, and high annotation level of the Escherichia coli genome and use it as a benchmark model for proteo-genomic data interpretation. We derive a comprehensive dataset of proteins expressed in the exponential growth of Escherichia coli and map the corresponding MS/MS spectra onto a six-frame translation of the E. coli genome. We hypothesize that the protein-coding part of the E. coli genome approaches complete annotation, and we consider six frame-specific (novel) PSMs as wrongly identified. This enables us to estimate the factual false discovery rate in a simple proteogenomic experiment. We show that the posterior error probability (PEP) distribution of novel peptides is almost identical to that of decoy (reversed) hits, which validates our assumption and points to the accumulation of false positive PSMs within novel peptide identifications. Our dataset comprises 2600 E. coli proteins, approaching the identification of the complete proteome expressed during exponential growth (36Iwasaki M. Miwa S. Ikegami T. Tomita M. Tanaka N. Ishihama Y. One-dimensional capillary liquid chromatographic separation coupled with tandem mass spectrometry unveils the Escherichia coli proteome on a microarray scale.Anal. Chem. 2010; 82: 2616-2620Crossref PubMed Scopus (115) Google Scholar), but covers only 31.7% of the protein-coding genome sequence. Wild-type E. coli strain K12 (isolate BW25113) (37Baba T. Ara T. Hasegawa M. Takai Y. Okumura Y. Baba M. Datsenko K.A. Tomita M. Wanner B.L. Mori H. Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection.Mol. Syst. Biol. 2006; 2 (2006.0008)Crossref Scopus (5381) Google Scholar) was inoculated in 5 ml lysogeny broth Luria/Miller medium at 37 °C under vigorous shaking for 24 h (A600 = 1.9), then 1 ml of the stationary culture was spun down at 260 × g for 10 min in order to remove any remaining from the Luria/Miller medium. The bacterial cells were washed twice with M9 minimal medium consisting of M9 salts (6.78 g/l Na2HPO4, 3 g/l KH2PO4, 0.5 g/l NaCl, 1 g/l NH4Cl, Sigma-Aldrich) supplemented with additional 0.5% (w/v) glucose, 33 μm thiamine, 1 mm MgSO4, 0.1 mm CaCl2. Next, the resultant pellet was resuspended in a final volume of 1 ml M9. Immediately after, 5 μl of this culture were used to inoculate 5 ml of fresh M9 medium containing 0.25 mg/ml of lysine (Sigma-Aldrich). Overnight, minimal medium cell cultures were grown at 37 °C under vigorous shaking to an A600 = 0.5 and used to inoculate (1:100 dilution) 125 ml of fresh minimal medium containing 0.25 mg/ml lysine. The cell cultures were grown to A600 = 0.5, harvested via centrifugation at 3345 × g for 10 min, washed with phosphate buffered saline, and snap-frozen in liquid nitrogen. The frozen cell pellets were resuspended in 3 to 5 ml lysis buffer (pH 7.5) containing 2 mg/ml lysozyme (Sigma-Aldrich) in 50 mm Tris/HCl buffer, 1 mm EDTA, and 5 mm of each of the following phosphatase inhibitors: glycerol-2-phosphate, sodium fluoride (Sigma-Aldrich Karlsruhe, Germany), and sodium orthovanadate (Alfa Aesar). Cell wall lysis was performed at 37 °C for 15 min, and DNA was comminuted by benzonase (1875 U) (Merck) for an additional 10 min. For the solubilization of membrane proteins, lithium dodecylsulfate (Sigma-Aldrich) was added to a final concentration of 1% (w/v) and samples were incubated at 37 °C under vigorous shaking for 15 min. Cell debris was removed via centrifugation at 3345 × g for 5 min and repeated centrifugation of the supernatant at 11,300 × g for 10 min. The crude protein extract was methanol/chloroform precipitated, and the protein precipitates were redissolved in denaturation buffer containing 6 m urea/2 M thiourea in 10 mm Tris buffer. For estimation of the protein concentration, each extract was measured via Bradford assay (Bio-Rad). In-gel digestion was performed as previously described (16Borchert N. Dieterich C. Krug K. Schutz W. Jung S. Nordheim A. Sommer R.J. Macek B. Proteo-genomics of Pristionchus pacificus reveals distinct proteome structure of nematode models.Genome Res. 2010; 20: 837-846Crossref PubMed Scopus (129) Google Scholar). Briefly, extracted proteins were separated on a NuPage Bis-Tris 4–12% gradient gel (Invitrogen). The gel was stained with Coomassie Blue and subsequently cut into 15 slices. The resulting gel pieces were destained by being washed three times with 10 mm ammonium bicarbonate (ABC) and acetonitrile (ACN) (1:1, v/v). Proteins were then reduced with 10 mm dithiothreitol (DTT) in 20 mm ABC for 45 min at 56 °C and alkylated with 55 mm iodoacetamide in 20 mm ABC for 30 min at room temperature in the dark. After being washed two times with 5 mm ABC and one time with ACN, the gel pieces were dehydrated in a vacuum centrifuge. Proteins were digested with either trypsin (Promega Fitchburg, WI) or Lys-C (Wako Neuss, Germany) (12.5 ng/μl in 20 mm ABC) at 37 °C overnight. The resulting peptides were extracted in three subsequent steps with the following solutions: (i) 3% TFA in 30% ACN, (ii) 0.5% acetic acid in 80% ACN, and (iii) 100% ACN. After evaporation of the ACN in a vacuum centrifuge, peptide fractions were desalted using Stage-Tips (38Ishihama Y. Rappsilber J. Mann M. Modular stop and go extraction tips with stacked disks for parallel and multidimensional peptide fractionation in proteomics.J. Proteome Res. 2006; 5: 988-994Crossref PubMed Scopus (224) Google Scholar). Protein extracts were reduced for 1 h at room temperature with 1 mm DTT and subsequently alkylated with 1 mm iodoacetamide for 1 h at room temperature in the dark. Proteins were pre-digested with Lys-C (1:100 w/w) for 3 h at room temperature. After dilution with 4 volumes of 20 mm ABC, proteins were digested overnight at room temperature with either trypsin (1:100 w/w) or Lys-C (1:100 w/w). Peptides derived from the in-solution digestion were separated according to their isoelectric point using the 3100 OffGel fractionator (Agilent Santa Clara, CA) following the manufacturer's instructions. Peptide mixtures were separated into 12 fractions using 13-cm Immobiline DryStrips with a pH 3–10 gradient (GE Healthcare). Separation was performed at a maximum current of 50 μA until 50 kVH were reached. Peptide fractions were acidified with acidic solution (30% ACN, 5% acetic acid, and 10% TFA in water) and desalted using Stage-Tips. Peptides from the in-solution digestion were desalted using solid phase extraction. Strong anion exchange chromatography was performed as described elsewhere (39Wisniewski J.R. Zougman A. Mann M. Combination of FASP and StageTip-based fractionation allows in-depth analysis of the hippocampal membrane proteome.J. Proteome Res. 2009; 8: 5674-5678Crossref PubMed Scopus (437) Google Scholar). Briefly, desalted peptides were loaded at pH 11 onto an anion exchange column containing six layers of Empore/Disk Anion Exchange (3m, St. Paul MN) in a 200-μl pipette tip. For conditioning and elution, Britton & Robinson Universal Buffer (0.02 m Ch3COOH, 0.02 m H3PO4, and 0.02 m H3BO3) at pH 3, 4, 5, 6, 8, and 11 was prepared. The column was activated with methanol and conditioned with 1 m NaOH followed by buffer (pH 11). The flow-through was acidified with acidic solution and loaded on a Stage-Tip. Peptides were eluted at pH 8, 6, 5, 4, and 3, acidified with acidic solution, and desalted using Stage-Tips. All peptide fractions were measured on an EASY-nLC II nano-LC (Proxeon Biosystems Odense, Denmark) coupled to an Orbitrap Velos mass spectrometer (Thermo Fisher Scientific). Chromatographic separation was done on a 15-cm PicoTip fused silica emitter with an inner diameter of 75 μm and an 8-μm tip inner diameter (New Objective Woburn, MA) packed in-house with reversed-phase ReproSil-Pur C18-AQ 3-μm resin (Dr. Maisch GmbH Ammerbuch-Entrigen, Germany). Peptides were injected into the column with solvent A (0.5% acetic acid) at 700 nl/min using a maximum pressure of 280 bar. Peptides were then eluted using an 81-min or a 221-min segmented gradient of 5%–50% solvent B (80% ACN in 0.5% acetic acid) at a flow rate of 200 nl/min. The mass spectrometer was operated in data-dependent mode. Survey full scans for the MS spectra were recorded between 300 and 2000 Thompson at a resolution of 60,000 with a target value of 1E6 charges. The 15 most intense peaks from the survey scans were selected for fragmentation with collision-induced dissociation at a target value of 5000 charges. The fragment spectra were recorded in the linear ion trap. Selected masses were included in a dynamic exclusion list for 90 s. Acquired MS data were preprocessed by MaxQuant (v.1.2.2.9) (40Cox J. Mann M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification.Nat. Biotechnol. 2008; 26: 1367-1372Crossref PubMed Scopus (9150) Google Scholar) in order to generate peak lists that could be submitted to a database search. Derived peak lists were submitted to the Andromeda (41Cox J. Neuhauser N. Michalski A. Scheltema R.A. Olsen J.V. Mann M. Andromeda: a peptide search engine integrated into the MaxQuant environment.J. Proteome Res. 2011; 10: 1794-1805Crossref PubMed Scopus (3448) Google Scholar) and Mascot v2.2.0 (Matrix Science, London, UK) search engines to query the genome database translated into all six reading frames. The genome sequence of E. coli (42Riley M. Abe T. Arnaud M.B. Berlyn M.K. Blattner F.R. Chaudhuri R.R. Glasner J.D. Horiuchi T. Keseler I.M. Kosuge T. Mori H. Perna N.T. Plunkett 3rd, G. Rudd K.E. Serres M.H. Thomas G.H. Thomson N.R. Wishart D. Wanner B.L. Escherichia coli K-12: a cooperatively developed annotation snapshot—2005.Nucleic Acids Res. 2006; 34: 1-9Crossref PubMed Scopus (424) Google Scholar, 43Hayashi K. Morooka N. Yamamoto Y. Fujita K. Isono K. Choi S. Ohtsubo E. Baba T. Wanner B.L. Mori H. Horiuchi T. Highly accurate genome sequences of Escherichia coli K-12 strains MG1655 and W3110.Mol. Syst. Biol. 2006; 2 (2006.0007)Crossref PubMed Scopus (362) Google Scholar) was downloaded from the NCBI homepage (accession number NC000913.2). The translation into all six reading frames was done from stop codon to stop codon by applying the bacterial and plant plasmid code (translation Table XI) using the Transeq tool that is part of the Emboss software package (44Rice P. Longden I. Bleasby A. EMBOSS: the European Molecular Biology Open Software Suite.Trends Genet. 2000; 16: 276-277Abstract Full Text Full Text PDF PubMed Scopus (6431) Google Scholar). We required a minimal length of six amino acids for each resulting putative open reading frame (ORF), which corresponds to the minimal peptide length that we required in the database search. To that database we added decoy sequences using the SequenceReverse.exe tool shipped with MaxQuant software. The resulting database consisted of 263,159 putative ORFs, 248 commonly observed lab contaminants, and 263,407 reversed sequences. A database search was performed with the precursor mass tolerance set to 6 and 7 ppm for Andromeda and Mascot database searches, respectively. The fragment ion mass tolerance was set to 0.5 Da for both search engines. Full enzyme specificity for trypsin and Lys-C was required, and up to two missed cleavages were allowed. Oxidation of methionine and protein N-terminal acetylation were defined as variable modifications, and carbamidomethylation of cysteine was defined as a fixed modification. The resulting lists of PSMs were further processed by MaxQuant and Trans-Proteomic Pipeline (v4.5 RAPTURE rev 0) (45Keller A. Nesvizhskii A.I. Kolker E. Aebersold R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search.Anal. Chem. 2002; 74: 5383-5392Crossref PubMed Scopus (3885) Google Scholar). Andromeda database scores calculated by MaxQuant were converted to PEPs as described in Ref. 41Cox J. Neuhauser N. Michalski A. Scheltema R.A. Olsen J.V. Mann M. Andromeda: a peptide search engine integrated into the MaxQuant environment.J. Proteome Res. 2011; 10: 1794-1805Crossref PubMed Scopus (3448) Google Scholar. We calculated q-values by sorting the PSMs by their PEPs in ascending order. For each PSM we calculated the ratio between the number of decoy hits and the number of target PSMs having PEPs below the PEP of the actual PSM. Mascot result (.dat) files were converted to pepXML format and further processed by the PeptideProphet (45Keller A. Nesvizhskii A.I. Kolker E. Aebersold R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search.Anal. Chem. 2002; 74: 5383-5392Crossref PubMed Scopus (3885) Google Scholar) module as part of the Trans-Proteomic Pipeline (TPP). We used the accurate mass binning option, excluded singly charged peptides, and used decoy hits to model the score distribution of false positives for semi-supervised mixture modeling. The false discovery rate (FDR) was controlled by filtering PSMs according to the probability assigned by PeptideProphet. The corresponding probability threshold was calculated by the calctppstat.pl perl script as part of the TPP, and the "Approx. P threshold for FDR" was used to filter the list of PSMs. Acquired MS data were additionally searched against a recent annotation of the E. coli genome (UniProt reference proteome set; downloaded on January 18, 2012; 4309 protein entries) using MaxQuant v1.2.2.9 operating with the same database search parameters as described above. FDRs on peptide and protein group levels were set at 1%. Detected peptide sequences that resulted from searching the six-fram

Ver no editor

Altmetric

PlumX

Entrar

Lembrar minha senha

Receber meu e-mail de confirmação

Deep Coverage of the Escherichia coli Proteome Enables the Assessment of False Discovery Rates in Simple Proteogenomic Experiments