N-terminal Proteomics Assisted Profiling of the Unexplored Translation Initiation Landscape in Arabidopsis thaliana
2017; Elsevier BV; Volume: 16; Issue: 6 Linguagem: Inglês
10.1074/mcp.m116.066662
ISSN1535-9484
AutoresPatrick J. Willems, Elvis Ndah, Veronique Jonckheere, Simon Stael, Adriaan Sticker, Lennart Martens, Frank Van Breusegem, Kris Gevaert, Petra Van Damme,
Tópico(s)Machine Learning in Bioinformatics
ResumoProteogenomics is an emerging research field yet lacking a uniform method of analysis. Proteogenomic studies in which N-terminal proteomics and ribosome profiling are combined, suggest that a high number of protein start sites are currently missing in genome annotations. We constructed a proteogenomic pipeline specific for the analysis of N-terminal proteomics data, with the aim of discovering novel translational start sites outside annotated protein coding regions. In summary, unidentified MS/MS spectra were matched to a specific N-terminal peptide library encompassing protein N termini encoded in the Arabidopsis thaliana genome. After a stringent false discovery rate filtering, 117 protein N termini compliant with N-terminal methionine excision specificity and indicative of translation initiation were found. These include N-terminal protein extensions and translation from transposable elements and pseudogenes. Gene prediction provided supporting protein-coding models for approximately half of the protein N termini. Besides the prediction of functional domains (partially) contained within the newly predicted ORFs, further supporting evidence of translation was found in the recently released Araport11 genome re-annotation of Arabidopsis and computational translations of sequences stored in public repositories. Most interestingly, complementary evidence by ribosome profiling was found for 23 protein N termini. Finally, by analyzing protein N-terminal peptides, an in silico analysis demonstrates the applicability of our N-terminal proteogenomics strategy in revealing protein-coding potential in species with well- and poorly-annotated genomes. Proteogenomics is an emerging research field yet lacking a uniform method of analysis. Proteogenomic studies in which N-terminal proteomics and ribosome profiling are combined, suggest that a high number of protein start sites are currently missing in genome annotations. We constructed a proteogenomic pipeline specific for the analysis of N-terminal proteomics data, with the aim of discovering novel translational start sites outside annotated protein coding regions. In summary, unidentified MS/MS spectra were matched to a specific N-terminal peptide library encompassing protein N termini encoded in the Arabidopsis thaliana genome. After a stringent false discovery rate filtering, 117 protein N termini compliant with N-terminal methionine excision specificity and indicative of translation initiation were found. These include N-terminal protein extensions and translation from transposable elements and pseudogenes. Gene prediction provided supporting protein-coding models for approximately half of the protein N termini. Besides the prediction of functional domains (partially) contained within the newly predicted ORFs, further supporting evidence of translation was found in the recently released Araport11 genome re-annotation of Arabidopsis and computational translations of sequences stored in public repositories. Most interestingly, complementary evidence by ribosome profiling was found for 23 protein N termini. Finally, by analyzing protein N-terminal peptides, an in silico analysis demonstrates the applicability of our N-terminal proteogenomics strategy in revealing protein-coding potential in species with well- and poorly-annotated genomes. Proteogenomics is an interdisciplinary research field combining proteomics, transcriptomics, and genomics with the aim of delineating protein-coding regions in genomes, thereby aiding protein discovery and genome annotation (1.Jaffe J.D. Berg H.C. Church G.M. Proteogenomic mapping as a complementary method to perform genome annotation.Proteomics. 2004; 4: 59-77Crossref PubMed Scopus (275) Google Scholar, 2.Nesvizhskii A.I. Proteogenomics: concepts, applications and computational strategies.Nat. Methods. 2014; 11: 1114-1125Crossref PubMed Scopus (0) Google Scholar). Such strategies have identified new variants of proteins, termed proteoforms (3.Smith L.M. Kelleher N.L. Consortium for Top Down Proteomics Proteoform: a single term describing protein complexity.Nat. Methods. 2013; 10: 186-187Crossref PubMed Scopus (892) Google Scholar), which arise from nucleotide polymorphisms (4.Zhang B. Wang J. Wang X. Zhu J. Liu Q. Shi Z. Chambers M.C. Zimmerman L.J. Shaddox K.F. Kim S. Davies S.R. Wang S. Wang P. Kinsinger C.R. Rivers R.C. Rodriguez H. Townsend R.R. Ellis M.J. Carr S.A. Tabb D.L. Coffey R.J. Slebos R.J. Liebler D.C. NCI CPTAC Proteogenomic characterization of human colon and rectal cancer.Nature. 2014; 513: 382-387Crossref PubMed Scopus (938) Google Scholar, 5.Ruggles K.V. Tang Z. Wang X. Grover H. Askenazi M. Teubl J. Cao S. McLellan M.D. Clauser K.R. Tabb D.L. Mertins P. Slebos R. Erdmann-Gilmore P. Li S. Gunawardena H.P. Xie L. Liu T. Zhou J.Y. Sun S. Hoadley K.A. Perou C.M. Chen X. Davies S.R. Maher C.A. Kinsinger C.R. Rodland K.D. Zhang H. Zhang Z. Ding L. Townsend R.R. Rodriguez H. Chan D. Smith R.D. Liebler D.C. Carr S.A. Payne S. Ellis M.J. Fenyo D. An analysis of the sensitivity of proteogenomic mapping of somatic mutations and novel splicing events in cancer.Mol. Cell. Proteomics. 2016; 15: 1060-1071Abstract Full Text Full Text PDF PubMed Scopus (82) Google Scholar, 6.Cesnik A.J. Shortreed M.R. Sheynkman G.M. Frey B.L. Smith L.M. Human proteomic variation revealed by combining RNA-Seq proteogenomics and global post-translational modification (G-PTM) search strategy.J. Proteome Res. 2016; 15: 800-808Crossref PubMed Scopus (22) Google Scholar), alternative translation initiation (i.e. N-terminal (Nt 1The abbreviations used are: Nt, N-terminal; 6-FT, six-frame translation; COFRADIC, combined fractional diagonal chromatography; EMBOSS, European molecular biology open software suite; FDR, false discovery rate; HARR, harringtonine; IGV, integrative genome viewer; iMet, initiator Methionine; LTM, lactimidomycin; NME, N-terminal methionine excision; PCV, packed cell volume; PSM, peptide-to-spectrum match; SCX, strong cation exchange; TAILS, terminal amine isotopic labeling of substrates; TIS, translation initiation site; uORF, upstream ORF. 1The abbreviations used are: Nt, N-terminal; 6-FT, six-frame translation; COFRADIC, combined fractional diagonal chromatography; EMBOSS, European molecular biology open software suite; FDR, false discovery rate; HARR, harringtonine; IGV, integrative genome viewer; iMet, initiator Methionine; LTM, lactimidomycin; NME, N-terminal methionine excision; PCV, packed cell volume; PSM, peptide-to-spectrum match; SCX, strong cation exchange; TAILS, terminal amine isotopic labeling of substrates; TIS, translation initiation site; uORF, upstream ORF.)-proteoforms (7.Gawron D. Gevaert K. Van Damme P. The proteome under translational control.Proteomics. 2014; 14: 2647-2662Crossref PubMed Scopus (34) Google Scholar, 8.Crappe J. Ndah E. Koch A. Steyaert S. Gawron D. De Keulenaer S. De Meester E. De Meyer T. Van Criekinge W. Van Damme P. Menschaert G. PROTEOFORMER: deep proteome coverage through ribosome profiling and MS integration.Nucleic Acids Res. 2015; 43: e29Crossref PubMed Scopus (101) Google Scholar)), splicing (5.Ruggles K.V. Tang Z. Wang X. Grover H. Askenazi M. Teubl J. Cao S. McLellan M.D. Clauser K.R. Tabb D.L. Mertins P. Slebos R. Erdmann-Gilmore P. Li S. Gunawardena H.P. Xie L. Liu T. Zhou J.Y. Sun S. Hoadley K.A. Perou C.M. Chen X. Davies S.R. Maher C.A. Kinsinger C.R. Rodland K.D. Zhang H. Zhang Z. Ding L. Townsend R.R. Rodriguez H. Chan D. Smith R.D. Liebler D.C. Carr S.A. Payne S. Ellis M.J. Fenyo D. An analysis of the sensitivity of proteogenomic mapping of somatic mutations and novel splicing events in cancer.Mol. Cell. Proteomics. 2016; 15: 1060-1071Abstract Full Text Full Text PDF PubMed Scopus (82) Google Scholar, 6.Cesnik A.J. Shortreed M.R. Sheynkman G.M. Frey B.L. Smith L.M. Human proteomic variation revealed by combining RNA-Seq proteogenomics and global post-translational modification (G-PTM) search strategy.J. Proteome Res. 2016; 15: 800-808Crossref PubMed Scopus (22) Google Scholar, 9.Li H.D. Menon R. Omenn G.S. Guan Y. Revisiting the identification of canonical splice isoforms through integration of functional genomics and proteomics evidence.Proteomics. 2014; 14: 2709-2718Crossref PubMed Scopus (27) Google Scholar), frame-shifts (10.Feng Y. Chien K.Y. Chen H.L. Chiu C.H. Pseudogene recoding revealed from proteomic analysis of salmonella serovars.J. Proteome Res. 2012; 11: 1715-1719Crossref PubMed Scopus (18) Google Scholar) and post-translational modifications (6.Cesnik A.J. Shortreed M.R. Sheynkman G.M. Frey B.L. Smith L.M. Human proteomic variation revealed by combining RNA-Seq proteogenomics and global post-translational modification (G-PTM) search strategy.J. Proteome Res. 2016; 15: 800-808Crossref PubMed Scopus (22) Google Scholar). Proteogenomic strategies vary depending on the experimental data used and the annotation depth of the studied model system (11.Zhang K. Fu Y. Zeng W.F. He K. Chi H. Liu C. Li Y.C. Gao Y. Xu P. He S.M. A note on the false discovery rate of novel peptides in proteogenomics.Bioinformatics. 2015; 31: 3249-3253Crossref PubMed Scopus (22) Google Scholar). Important for proteomics-driven proteogenomics are customized protein databases that allow for more accurate protein identification using tandem mass spectrometry (MS/MS) data, thereby leading to the refinement of protein-coding gene segments and the discovery of novel gene products. In Arabidopsis, previous proteogenomic studies reported on the use of a protein sequence database based on six-frame translation (6-FT) of the entire genome (12.Castellana N.E. Payne S.H. Shen Z. Stanke M. Bafna V. Briggs S.P. Discovery and revision of Arabidopsis genes by proteogenomics.Proc. Natl. Acad. Sci. U.S.A. 2008; 105: 21034-21038Crossref PubMed Scopus (232) Google Scholar, 13.Baerenfaller K. Grossmann J. Grobei M.A. Hull R. Hirsch-Hoffmann M. Yalovsky S. Zimmermann P. Grossniklaus U. Gruissem W. Baginsky S. Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics.Science. 2008; 320: 938-941Crossref PubMed Scopus (415) Google Scholar), which was searched in parallel with ab initio predicted genes in case of Castellana et al. (12.Castellana N.E. Payne S.H. Shen Z. Stanke M. Bafna V. Briggs S.P. Discovery and revision of Arabidopsis genes by proteogenomics.Proc. Natl. Acad. Sci. U.S.A. 2008; 105: 21034-21038Crossref PubMed Scopus (232) Google Scholar). Overall, these efforts resulted in the reclassification of 99 pseudogenes into protein-coding genes, next to the refinement of existing gene structures in the TAIR9 genome release (12.Castellana N.E. Payne S.H. Shen Z. Stanke M. Bafna V. Briggs S.P. Discovery and revision of Arabidopsis genes by proteogenomics.Proc. Natl. Acad. Sci. U.S.A. 2008; 105: 21034-21038Crossref PubMed Scopus (232) Google Scholar, 13.Baerenfaller K. Grossmann J. Grobei M.A. Hull R. Hirsch-Hoffmann M. Yalovsky S. Zimmermann P. Grossniklaus U. Gruissem W. Baginsky S. Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics.Science. 2008; 320: 938-941Crossref PubMed Scopus (415) Google Scholar, 14.Lamesch P. Berardini T.Z. Li D. Swarbreck D. Wilks C. Sasidharan R. Muller R. Dreher K. Alexander D.L. Garcia-Hernandez M. Karthikeyan A.S. Lee C.H. Nelson W.D. Ploetz L. Singh S. Wensel A. Huala E. The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools.Nucleic Acids Res. 2012; 40: D1202-D1210Crossref PubMed Scopus (1366) Google Scholar). Besides 6-FT or genome-based gene prediction, OMICS data can also aid in the rational design of customized protein databases (2.Nesvizhskii A.I. Proteogenomics: concepts, applications and computational strategies.Nat. Methods. 2014; 11: 1114-1125Crossref PubMed Scopus (0) Google Scholar, 15.Menschaert G. Fenyo D. Proteogenomics from a bioinformatics angle: A growing field.Mass Spectrom. Rev. 2015; 9999: 1-16Google Scholar). By providing direct evidence of in vivo protein synthesis, the sequencing of ribosome-protected mRNA fragments by ribosome profiling (ribo-seq) serves such a purpose. In eukaryotes, ribosomes can be specifically halted at translation initiation sites (TIS) using initiation-specific translation inhibitors (e.g. lactimidomycin and harringtonine; 16.Lee S. Liu B. Lee S. Huang S.X. Shen B. Qian S.B. Global mapping of translation initiation sites in mammalian cells at single-nucleotide resolution.Proc. Natl. Acad. Sci. U.S.A. 2012; 109: E2424-E2432Crossref PubMed Scopus (404) Google Scholar, 17.Ingolia N.T. Lareau L.F. Weissman J.S. Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes.Cell. 2011; 147: 789-802Abstract Full Text Full Text PDF PubMed Scopus (1427) Google Scholar). By depleting for elongating ribosomes, this approach allows mapping of the translation initiation landscape and, concomitantly, ORF delineation (16.Lee S. Liu B. Lee S. Huang S.X. Shen B. Qian S.B. Global mapping of translation initiation sites in mammalian cells at single-nucleotide resolution.Proc. Natl. Acad. Sci. U.S.A. 2012; 109: E2424-E2432Crossref PubMed Scopus (404) Google Scholar, 17.Ingolia N.T. Lareau L.F. Weissman J.S. Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes.Cell. 2011; 147: 789-802Abstract Full Text Full Text PDF PubMed Scopus (1427) Google Scholar, 18.Raj A. Wang S.H. Shim H. Harpak A. Li Y.I. Engelmann B. Stephens M. Gilad Y. Pritchard J.K. Thousands of novel translated open reading frames in humans inferred by ribosome footprint profiling.Elife. 2016; 5: e13328Crossref PubMed Scopus (81) Google Scholar). We previously used such ribo-seq data to generate customized databases for MS/MS searches, resulting in the identification of proteoforms initiating at near-cognate start sites, N-terminally truncated and extended proteoforms, translation products of upstream ORFs as well as previously unannotated proteins (8.Crappe J. Ndah E. Koch A. Steyaert S. Gawron D. De Keulenaer S. De Meester E. De Meyer T. Van Criekinge W. Van Damme P. Menschaert G. PROTEOFORMER: deep proteome coverage through ribosome profiling and MS integration.Nucleic Acids Res. 2015; 43: e29Crossref PubMed Scopus (101) Google Scholar, 19.Menschaert G. Van Criekinge W. Notelaers T. Koch A. Crappe J. Gevaert K. Van Damme P. Deep proteome coverage based on ribosome profiling aids mass spectrometry-based protein and peptide discovery and provides evidence of alternative translation products and near-cognate translation initiation events.Mol. Cell. Proteomics. 2013; 12: 1780-1790Abstract Full Text Full Text PDF PubMed Scopus (129) Google Scholar, 20.Koch A. Gawron D. Steyaert S. Ndah E. Crappe J. De Keulenaer S. De Meester E. Ma M. Shen B. Gevaert K. Van Criekinge W. Van Damme P. Menschaert G. A proteogenomics approach integrating proteomics and ribosome profiling increases the efficiency of protein identification and enables the discovery of alternative translation start sites.Proteomics. 2014; 14: 2688-2698Crossref PubMed Scopus (56) Google Scholar, 21.Gawron D. Ndah E. Gevaert K. Van Damme P. Positional proteomics reveals differences in N-terminal proteoform stability.Mol. Syst. Biol. 2016; 12: 858Crossref PubMed Scopus (51) Google Scholar). Whereas shotgun proteomic data have been primarily used for proteogenomic studies, data originating from subproteome analysis have proven to be resourceful as well. For instance, a peptidomic workflow that enriches for small proteins and peptides was used for the discovery of protein-coding small ORFs in human (22.Slavoff S.A. Mitchell A.J. Schwaid A.G. Cabili M.N. Ma J. Levin J.Z. Karger A.D. Budnik B.A. Rinn J.L. Saghatelian A. Peptidomic discovery of short open reading frame-encoded peptides in human cells.Nat. Chem. Biol. 2013; 9: 59-64Crossref PubMed Scopus (404) Google Scholar, 23.Ma J. Diedrich J.K. Jungreis I. Donaldson C. Vaughan J. Kellis M. Yates 3rd, J.R. Saghatelian A. Improved identification and analysis of small open reading frame encoded polypeptides.Anal. Chem. 2016; 88: 3967-3975Crossref PubMed Scopus (77) Google Scholar). In Arabidopsis, a proteogenomic study (12.Castellana N.E. Payne S.H. Shen Z. Stanke M. Bafna V. Briggs S.P. Discovery and revision of Arabidopsis genes by proteogenomics.Proc. Natl. Acad. Sci. U.S.A. 2008; 105: 21034-21038Crossref PubMed Scopus (232) Google Scholar) made use of enriched phosphopeptides as these often originate from low abundant proteins that can be absent in shotgun proteomics data (24.Vu L.D. Stes E. Van Bel M. Nelissen H. Maddelein D. Inze D. Coppens F. Martens L. Gevaert K. De Smet I. Up-to-date workflow for plant (phospho)proteomics identifies differential drought-responsive phosphorylation events in maize leaves.J. Proteome Res. 2016; 15: 4304-4317Crossref PubMed Scopus (34) Google Scholar). Further, positional proteomics, enriching for peptides holding protein N termini that can be considered as proxies of translation initiation, has been used for discovering and refining protein-coding gene structures in mouse and human cells (8.Crappe J. Ndah E. Koch A. Steyaert S. Gawron D. De Keulenaer S. De Meester E. De Meyer T. Van Criekinge W. Van Damme P. Menschaert G. PROTEOFORMER: deep proteome coverage through ribosome profiling and MS integration.Nucleic Acids Res. 2015; 43: e29Crossref PubMed Scopus (101) Google Scholar, 18.Raj A. Wang S.H. Shim H. Harpak A. Li Y.I. Engelmann B. Stephens M. Gilad Y. Pritchard J.K. Thousands of novel translated open reading frames in humans inferred by ribosome footprint profiling.Elife. 2016; 5: e13328Crossref PubMed Scopus (81) Google Scholar, 19.Menschaert G. Van Criekinge W. Notelaers T. Koch A. Crappe J. Gevaert K. Van Damme P. Deep proteome coverage based on ribosome profiling aids mass spectrometry-based protein and peptide discovery and provides evidence of alternative translation products and near-cognate translation initiation events.Mol. Cell. Proteomics. 2013; 12: 1780-1790Abstract Full Text Full Text PDF PubMed Scopus (129) Google Scholar, 20.Koch A. Gawron D. Steyaert S. Ndah E. Crappe J. De Keulenaer S. De Meester E. Ma M. Shen B. Gevaert K. Van Criekinge W. Van Damme P. Menschaert G. A proteogenomics approach integrating proteomics and ribosome profiling increases the efficiency of protein identification and enables the discovery of alternative translation start sites.Proteomics. 2014; 14: 2688-2698Crossref PubMed Scopus (56) Google Scholar), as well as in bacteria (25.Baudet M. Ortet P. Gaillard J.C. Fernandez B. Guerin P. Enjalbal C. Subra G. de Groot A. Barakat M. Dedieu A. Armengaud J. Proteomics-based refinement of Deinococcus deserti genome annotation reveals an unwonted use of non-canonical translation initiation codons.Mol. Cell. Proteomics. 2010; 9: 415-426Abstract Full Text Full Text PDF PubMed Scopus (83) Google Scholar, 26.Bland C. Hartmann E.M. Christie-Oleza J.A. Fernandez B. Armengaud J. N-Terminal-oriented proteogenomics of the marine bacterium roseobacter denitrificans Och114 using N-Succinimidyloxycarbonylmethyl)tris(2,4,6-trimethoxyphenyl)phosphonium bromide (TMPP) labeling and diagonal chromatography.Mol. Cell. Proteomics. 2014; 13: 1369-1381Abstract Full Text Full Text PDF PubMed Scopus (33) Google Scholar, 27.Gallien S. Perrodou E. Carapito C. Deshayes C. Reyrat J.M. Van Dorsselaer A. Poch O. Schaeffer C. Lecompte O. Ortho-proteogenomics: multiple proteomes investigation through orthology and a new MS-based protocol.Genome Res. 2009; 19: 128-135Crossref PubMed Scopus (95) Google Scholar) and archaea (28.Yamazaki S. Yamazaki J. Nishijima K. Otsuka R. Mise M. Ishikawa H. Sasaki K. Tago S. Isono K. Proteome analysis of an aerobic hyperthermophilic crenarchaeon, Aeropyrum pernix K1.Mol. Cell. Proteomics. 2006; 5: 811-823Abstract Full Text Full Text PDF PubMed Scopus (48) Google Scholar, 29.Aivaliotis M. Gevaert K. Falb M. Tebbe A. Konstantinidis K. Bisle B. Klein C. Martens L. Staes A. Timmerman E. Van Damme J. Siedler F. Pfeiffer F. Vandekerckhove J. Oesterhelt D. Large-scale identification of N-terminal peptides in the halophilic archaea Halobacterium salinarum and Natronomonas pharaonis.J. Proteome Res. 2007; 6: 2195-2204Crossref PubMed Scopus (102) Google Scholar). Previously, we presented PROTEOFORMER, a tool which allows for the creation of protein sequence databases for proteomics-based identification based on translation initiation data obtained by ribosome profiling (8.Crappe J. Ndah E. Koch A. Steyaert S. Gawron D. De Keulenaer S. De Meester E. De Meyer T. Van Criekinge W. Van Damme P. Menschaert G. PROTEOFORMER: deep proteome coverage through ribosome profiling and MS integration.Nucleic Acids Res. 2015; 43: e29Crossref PubMed Scopus (101) Google Scholar). All TIS identified by ribo-seq can then be matched with Nt-proteomics data (8.Crappe J. Ndah E. Koch A. Steyaert S. Gawron D. De Keulenaer S. De Meester E. De Meyer T. Van Criekinge W. Van Damme P. Menschaert G. PROTEOFORMER: deep proteome coverage through ribosome profiling and MS integration.Nucleic Acids Res. 2015; 43: e29Crossref PubMed Scopus (101) Google Scholar, 18.Raj A. Wang S.H. Shim H. Harpak A. Li Y.I. Engelmann B. Stephens M. Gilad Y. Pritchard J.K. Thousands of novel translated open reading frames in humans inferred by ribosome footprint profiling.Elife. 2016; 5: e13328Crossref PubMed Scopus (81) Google Scholar, 19.Menschaert G. Van Criekinge W. Notelaers T. Koch A. Crappe J. Gevaert K. Van Damme P. Deep proteome coverage based on ribosome profiling aids mass spectrometry-based protein and peptide discovery and provides evidence of alternative translation products and near-cognate translation initiation events.Mol. Cell. Proteomics. 2013; 12: 1780-1790Abstract Full Text Full Text PDF PubMed Scopus (129) Google Scholar) to improve protein identification rates. Although entire genome translation databases are criticized because they suffer from the "needle in the haystack" problem (2.Nesvizhskii A.I. Proteogenomics: concepts, applications and computational strategies.Nat. Methods. 2014; 11: 1114-1125Crossref PubMed Scopus (0) Google Scholar, 20.Koch A. Gawron D. Steyaert S. Ndah E. Crappe J. De Keulenaer S. De Meester E. Ma M. Shen B. Gevaert K. Van Criekinge W. Van Damme P. Menschaert G. A proteogenomics approach integrating proteomics and ribosome profiling increases the efficiency of protein identification and enables the discovery of alternative translation start sites.Proteomics. 2014; 14: 2688-2698Crossref PubMed Scopus (56) Google Scholar, 30.Blakeley P. Overton I.M. Hubbard S.J. Addressing statistical biases in nucleotide-derived protein databases for proteogenomic search strategies.J. Proteome Res. 2012; 11: 5221-5234Crossref PubMed Scopus (64) Google Scholar), especially in the case of eukaryotes, a rationalized reduction of database size benefits the sensitivity for identifying novel peptides or proteins (2.Nesvizhskii A.I. Proteogenomics: concepts, applications and computational strategies.Nat. Methods. 2014; 11: 1114-1125Crossref PubMed Scopus (0) Google Scholar, 30.Blakeley P. Overton I.M. Hubbard S.J. Addressing statistical biases in nucleotide-derived protein databases for proteogenomic search strategies.J. Proteome Res. 2012; 11: 5221-5234Crossref PubMed Scopus (64) Google Scholar). Here, we constructed an Nt peptide database tailored for searching Nt-proteomics data and permitting genome-wide searches for TIS without causing a drastic increase in the peptide search space. After applying data and feature dependent selection criteria, several newly identified N termini were confirmed by ribosome profiling data and other types of supportive metadata. A. thaliana cell suspension cultures ecotype Landsberg erecta (Plant Systems Biology-Light, Arabidopsis Biological Resource Center stock CCL84841) were cultured as described (31.Van Leene J. Stals H. Eeckhout D. Persiau G. Van De Slijke E. Van Isterdael G. De Clercq A. Bonnet E. Laukens K. Remmerie N. Henderickx K. De Vijlder T. Abdelkrim A. Pharazyn A. Van Onckelen H. Inze D. Witters E. De Jaeger G. A tandem affinity purification-based technology platform to study the cell cycle interactome in Arabidopsis thaliana.Mol. Cell. Proteomics. 2007; 6: 1226-1238Abstract Full Text Full Text PDF PubMed Scopus (169) Google Scholar). The cells were subcultured every week in fresh medium at a 1:10 dilution in 500 ml conical flasks and shaken at 125 rpm at 25 °C in an orbital shaker under continuous light (50 μE). Two days after subculturing, cell suspensions were harvested for ribosome profiling or proteomics analyses as described below. For Nt-COFRADIC analysis, 50 ml of cell suspensions at an 1–2% packed cell volume (PCV) were collected on a Whatman® membrane filter with nylon pore size 0.45 μm using sintered glass filtration, followed by a ice-cold PBS wash. Collected cells were subjected to snap freezing in liquid nitrogen and frozen cell pellets were ground into a fine powder using a liquid nitrogen cooled pestle and mortar. The frozen powder was thawed in 10 ml ice-cold buffer (50 mm sodium phosphate pH 7.5, 100 mm NaCl and 1 × cOmplete™, EDTA-free protease inhibitor mixture (Roche, Basel, Switzerland), left on ice for 10 min and the mixture was subjected to one additional cycle of freeze-thawing. Cell debris was eliminated by centrifugation at 16,000 × g for 15 min at 4 °C. The supernatant was recovered and the protein concentration determined using the DC Protein Assay Kit from Bio-Rad (Munich, Germany). For all proteome analyses performed, 3 mg of protein material (corresponding to about 1 ml of lysate) was subjected to Nt-COFRADIC analysis as described previously (32.Staes A. Impens F. Van Damme P. Ruttens B. Goethals M. Demol H. Timmerman E. Vandekerckhove J. Gevaert K. Selecting protein N-terminal peptides by combined fractional diagonal chromatography.Nat. Protoc. 2011; 6: 1130-1141Crossref PubMed Scopus (139) Google Scholar) however, in the case of the endoproteinases Glu-C, Asp-N, and chymotrypsin digests, no strong cation exchange (SCX) pre-fractionation was performed. In one of the two tryptic replicates, the SCX prefractionation was omitted. enable the assignment of in vivo Nt-acetylation events, prior to digestion, all primary protein amines were blocked using an N-hydroxysuccinimide ester of 13C2D3-acetate. Proteomes were digested overnight at 37 °C using mass spectrometry grade trypsin (enzyme/substrate of 1/100, w/w; Promega, Madison, WI), chymotrypsin (1/60, w/w; Promega), endoproteinase Glu-C (1/75, w/w; Thermo Fisher Scientific, Bremen, Germany) or endoproteinase Asp-N (1/200, w/w; Promega) while mixing at 550 rpm. The resulting peptide mixtures were enriched for N-terminal peptides by diagonal chromatography as part of the actual COFRADIC sorting procedure. More specifically, in between two identical reverse-phase peptide separations, internal peptides are reacted with 2,4,6-trinitrobenzenesulfonic acid (TNBS), rendering them more hydrophobic and thereby causing them to shift away from the unmodified N-terminal peptides during the second chromatographic separation. By the addition of H2O2 to a f.c. of 0.5% for 30′ at 30 °C, a methionine oxidation step was also introduced between the first RP-HPLC separation and the series of secondary RP-HPLC separations, thereby shifting all methionine-containing Nt-peptides to earlier elution times allowing their enrichment (33.Van Damme P. Van Damme J. Demol H. Staes A. Vandekerckhove J. Gevaert K. A review of COFRADIC techniques targeting protein N-terminal acetylation.BMC Proceedings. 2009; 3: S6Crossref PubMed Google Scholar). The obtained fractions enriched for protein N termini were introduced into an LC-MS/MS system; the Ultimate 3000 (Dionex, Amsterdam, The Netherlands) in-line connected to an LTQ Orbitrap XL mass spectrometer (Thermo Fisher Scientific) and LC-MS/MS analysis was performed as described previously (34.Arnesen T. Van Damme P. Polevoda B. Helsens K. Evjenth R. Colaert N. Varhaug J.E. Vandekerckhove J. Lillehaug J.R. Sherman F. Gevaert K. Proteomics analyses reveal the evolutionary conservation and divergence of N-terminal acetyltransferases from yeast and humans.Proc. Natl. Acad. Sci. U.S.A. 2009; 106: 8157-8162Crossref PubMed Scopus (382) Google Scholar, 35.Van Damme P. Hole K. Pimenta-Marques A. Helsens K. Vandekerckhove J. Martinho R.G. Gevaert K. Arnesen T. NatF contributes to an evolutionary shift in protein N-terminal acetylation and is important for normal chromosome segregation.PLoS Genet. 2011; 7: e1002169Crossref PubMed Scopus (110) Google Scholar). MS/MS peak lists were searched in parallel using three mass spectrometry search engines and with identical parameter settings when possible. A multistage search strategy was used: MS/MS spectra were first searched against the Arabidopsis proteome database (TAIR10, containing 35,386 entries; http://www.arabidopsis.org/), and unidentified spectra were used as input for a second MS/MS search against a customized peptide database (see next paragraph). The search engines used were COMET (36.Eng J.K. Jahan T.A. Hoopmann M.R. Comet: an open-source MS/MS sequence database search tool.Proteomics. 2013; 13: 22-24Crossref PubMed Scopus (788) Google Scholar; version 2016.01 rev. 2), Crux (37.Park C.Y. Klammer A.A. Kall L. MacCoss M.J. Noble W.S. Rapid and accurate peptide identification from tandem mass spectra.J. Proteome Res. 2008; 7: 3022-3027Crossref PubMed Scopus (138) Google Scholar; version 2.1.16866) and MS-GF+ (38.Kim S. Pevzner P.A. MS-GF+ makes progress towards a universal database search tool for proteomics.Nat. Commun. 2014; 5: 5277Crossref PubMed Scopus (651) Google Scholar; version 2016.06.29). Mass tolerance on precursor ions was set to 10 ppm and on fragment ions to 0.5 Da. Peptide length was 7 to 40 amino acids. Semispecific enzyme settings adjusted to the enzyme and to the avai
Referência(s)