OpenPepXL: An Open-Source Tool for Sensitive Identification of Cross-Linked Peptides in XL-MS

Artigo Acesso aberto Revisado por pares

OpenPepXL: An Open-Source Tool for Sensitive Identification of Cross-Linked Peptides in XL-MS

2020; Elsevier BV; Volume: 19; Issue: 12 Linguagem: Inglês

10.1074/mcp.tir120.002186

ISSN

1535-9484

Autores

Eugen Netz, Tjeerd M. H. Dijkstra, Timo Sachsenberg, Lukas Zimmermann, Mathias Walzer, Thomas Monecke, Ralf Ficner, Olexandr Dybkov, Henning Urlaub, Oliver Kohlbacher,

Tópico(s)

RNA and protein synthesis mechanisms

Resumo

Cross-linking MS (XL-MS) has been recognized as an effective source of information about protein structures and interactions. In contrast to regular peptide identification, XL-MS has to deal with a quadratic search space, where peptides from every protein could potentially be cross-linked to any other protein. To cope with this search space, most tools apply different heuristics for search space reduction. We introduce a new open-source XL-MS database search algorithm, OpenPepXL, which offers increased sensitivity compared with other tools. OpenPepXL searches the full search space of an XL-MS experiment without using heuristics to reduce it. Because of efficient data structures and built-in parallelization OpenPepXL achieves excellent runtimes and can also be deployed on large compute clusters and cloud services while maintaining a slim memory footprint. We compared OpenPepXL to several other commonly used tools for identification of noncleavable labeled and label-free cross-linkers on a diverse set of XL-MS experiments. In our first comparison, we used a data set from a fraction of a cell lysate with a protein database of 128 targets and 128 decoys. At 5% FDR, OpenPepXL finds from 7% to over 50% more unique residue pairs (URPs) than other tools. On data sets with available high-resolution structures for cross-link validation OpenPepXL reports from 7% to over 40% more structurally validated URPs than other tools. Additionally, we used a synthetic peptide data set that allows objective validation of cross-links without relying on structural information and found that OpenPepXL reports at least 12% more validated URPs than other tools. It has been built as part of the OpenMS suite of tools and supports Windows, macOS, and Linux operating systems. OpenPepXL also supports the MzIdentML 1.2 format for XL-MS identification results. It is freely available under a three-clause BSD license at https://openms.org/openpepxl. Cross-linking MS (XL-MS) has been recognized as an effective source of information about protein structures and interactions. In contrast to regular peptide identification, XL-MS has to deal with a quadratic search space, where peptides from every protein could potentially be cross-linked to any other protein. To cope with this search space, most tools apply different heuristics for search space reduction. We introduce a new open-source XL-MS database search algorithm, OpenPepXL, which offers increased sensitivity compared with other tools. OpenPepXL searches the full search space of an XL-MS experiment without using heuristics to reduce it. Because of efficient data structures and built-in parallelization OpenPepXL achieves excellent runtimes and can also be deployed on large compute clusters and cloud services while maintaining a slim memory footprint. We compared OpenPepXL to several other commonly used tools for identification of noncleavable labeled and label-free cross-linkers on a diverse set of XL-MS experiments. In our first comparison, we used a data set from a fraction of a cell lysate with a protein database of 128 targets and 128 decoys. At 5% FDR, OpenPepXL finds from 7% to over 50% more unique residue pairs (URPs) than other tools. On data sets with available high-resolution structures for cross-link validation OpenPepXL reports from 7% to over 40% more structurally validated URPs than other tools. Additionally, we used a synthetic peptide data set that allows objective validation of cross-links without relying on structural information and found that OpenPepXL reports at least 12% more validated URPs than other tools. It has been built as part of the OpenMS suite of tools and supports Windows, macOS, and Linux operating systems. OpenPepXL also supports the MzIdentML 1.2 format for XL-MS identification results. It is freely available under a three-clause BSD license at https://openms.org/openpepxl. Cross-Linking Mass Spectrometry (XL-MS) has proven to be a valuable tool in studying the structures and interactions of proteins (1Liu F. Heck A.J. Interrogating the architecture of protein assemblies and protein interaction networks by cross-linking mass spectrometry.Curr. Opin. Struct. Biol. 2015; 35: 100-108Crossref PubMed Scopus (79) Google Scholar, 2Sinz A. Arlt C. Chorev D. Sharon M. Chemical cross-linking and native mass spectrometry: A fruitful combination for structural biology.Protein Sci. 2015; 24: 1193-1209Crossref PubMed Scopus (92) Google Scholar, 3Leitner A. Faini M. Stengel F. Aebersold R. Crosslinking and mass spectrometry: an integrated technology to understand the structure and function of molecular machines.Trends Biochem. Sci. 2016; 41: 20-32Abstract Full Text Full Text PDF PubMed Scopus (232) Google Scholar, 4O'Reilly F.J. Rappsilber J. Cross-linking mass spectrometry: methods and applications in structural, molecular and systems biology.Nat. Struct. Mol. Biol. 2018; 25: 1000-1008Crossref PubMed Scopus (115) Google Scholar, 5Chavez J.D. Bruce J.E. Chemical cross-linking with mass spectrometry: a tool for systems structural biology.Curr. Opin. Chem. Biol. 2019; 48: 8-18Crossref PubMed Scopus (70) Google Scholar). Although XL-MS is maturing as a very useful method, there is space for improvement at every step of the workflow. Especially the enrichment step of cross-linked peptides derived from cross-linked protein samples has profound effects on the XL-MS analysis as well as the following computational identification and the statistics of the FDR of annotated MS2 spectra. In many XL-MS experiments the samples still contain a vast number of noncross-linked, i.e. linear peptides; consequently cross-linked peptides usually occur with low intensities and are thus less likely to be selected for fragmentation in data-dependent acquisition as well. Therefore, precursor and fragment spectra of relatively few cross-links must be identified among a large set of spectra from unmodified peptides. This is one of the issues that make the statistics for post-processing and filtering XL-MS data more difficult when compared with the identification of linear peptides. Fragment spectra of cross-linked peptides are also more difficult to annotate as they contain fragments from two peptides. Scoring the whole cross-link fragment spectrum match might result in identifications where one peptide sequence is covered by many fragment ions whereas the second peptide is identified by its precursor mass and very few matching fragment ions only. Reliable identification of one of the peptide sequences does not depend on correct identification of the other sequence. It is possible to have an identification of a cross-linked peptide pair with a high score in a database search where the high score is based on a legitimate good match to one correct peptide, but with a bad match to the second peptide. Reliable identification of a cross-link, that is intended to be useful for modeling a protein structure or complex, requires correct identifications for both peptides and hence the whole identification can only be as good as the identification of the worst of the two peptides (6Trnka M.J. Baker P.R. Robinson P.J.J. Burlingame A.L. Chalkley R.J. Matching cross-linked peptide spectra: only as good as the worse identification.Mol. Cell. Proteomics. 2014; 13: 420-434Abstract Full Text Full Text PDF PubMed Scopus (119) Google Scholar). The search for two peptides in each fragment spectrum also has implications for the performance of XL-MS identification software. For a given precursor mass in conventional MS-based protein identification, the length of a possibly matching linear peptide can be roughly estimated by applying an 'averagine' model (7Senko M.W. Beu S.C. McLaffertycor F.W. Determination of monoisotopic masses and ion populations for large biomolecules from resolved isotopic distributions.J. Am. Soc. Mass Spectrom. 1995; 6: 229-233Crossref PubMed Scopus (384) Google Scholar). The number of candidates to be considered for matching peptides in database search primarily depends on the width of the precursor mass tolerance window and the size of the protein database. In XL-MS the mass distributes across two peptides and only the sum of their masses plus the mass of the cross-linker is known. The computational search space contains all possible combinations of cross-linked peptides whose sum of masses lies within the precursor mass window. Searching all combinations of peptides rather than just linearly scanning all peptides requires efficient algorithms to perform a search on acceptable time scales. The most obvious solution used by some XL-MS search tools is a brute-force enumeration of all peptide-peptide pairs and filtering them by precursor mass (8Gotze M. Pettelkau J. Schaks S. Bosse K. Ihling C.H. Krauth F. Fritzsche R. Kühn U. Sinz A. StavroX—a software for analyzing crosslinked products in protein interaction studies.J. Am. Soc. Mass Spectrom. 2012; 23: 76-87Crossref PubMed Scopus (235) Google Scholar). Searches can be sped up by using stable-isotope labeled cross-linkers in the cross-linking experiment (9Rinner O. Seebacher J. Walzthoeni T. Mueller L.N. Beck M. Schmidt A. Mueller M. Aebersold R. Identification of cross-linked peptides from large sequence databases.Nat. Methods. 2008; 5: 315-318Crossref PubMed Scopus (4) Google Scholar, 10Leitner A. Walzthoeni T. Aebersold R. Lysine-specific chemical cross-linking of protein complexes and identification of cross-linking sites using LC-MS/MS and the xQuest/xProphet software pipeline.Nat. Protoc. 2014; 9: 120-137Crossref PubMed Scopus (159) Google Scholar). Such labeling makes cross-linked spectra easily identifiable on the MS1 level and thus reduces the number of corresponding MS2 spectra to be searched by the database search tool. Several conventional linear peptide search tools (11Kong A.T. Leprevost F.V. Avtonomov D.M. Mellacheruvu D. Nesvizhskii A.I. MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry–based proteomics.Nat. Methods. 2017; 14: 513-520Crossref PubMed Scopus (310) Google Scholar) as well as xQuest (9Rinner O. Seebacher J. Walzthoeni T. Mueller L.N. Beck M. Schmidt A. Mueller M. Aebersold R. Identification of cross-linked peptides from large sequence databases.Nat. Methods. 2008; 5: 315-318Crossref PubMed Scopus (4) Google Scholar, 10Leitner A. Walzthoeni T. Aebersold R. Lysine-specific chemical cross-linking of protein complexes and identification of cross-linking sites using LC-MS/MS and the xQuest/xProphet software pipeline.Nat. Protoc. 2014; 9: 120-137Crossref PubMed Scopus (159) Google Scholar, 12Walzthoeni T. Claassen M. Leitner A. Herzog F. Bohn S. Förster F. Beck M. Aebersold R. False discovery rate estimation for cross-linked peptides identified by mass spectrometry.Nat. Methods. 2012; 9: 901-903Crossref PubMed Scopus (198) Google Scholar) and pLink2 (13Chen Z.-L. Meng J.-M. Cao Y. Yin J.-L. Fang R.-Q. Fan S.-B. Liu C. Zeng W.-F. Ding Y.-H. Tan D. Wu L. Zhou W.-J. Chi H. Sun R.-X. Dong M.-Q. He S.-M. A high-speed search engine pLink 2 with systematic evaluation for proteome-scale identification of cross-linked peptides.Nat. Commun. 2019; 103404 Crossref PubMed Scopus (122) Google Scholar) use pre-calculated fragment ion indices to retrieve peptides from the protein database based on observed fragment ions. Just like StavroX (8Gotze M. Pettelkau J. Schaks S. Bosse K. Ihling C.H. Krauth F. Fritzsche R. Kühn U. Sinz A. StavroX—a software for analyzing crosslinked products in protein interaction studies.J. Am. Soc. Mass Spectrom. 2012; 23: 76-87Crossref PubMed Scopus (235) Google Scholar), xQuest fragments and scores pairs of peptides at a time. Therefore, the use of labeled linkers combined with an ion index limits its computational memory consumption and makes it applicable to large protein databases. Another method for reducing the large search space is to use multi-pass scoring. A first scoring step based on a quick heuristic or a partial score can substantially reduce the number of candidates subjected to full scoring, thus reducing the overall runtime. For example, Kojak (14Hoopmann M.R. Zelter A. Johnson R.S. Riffle M. MacCoss M.J. Davis T.N. Moritz R.L. Kojak: efficient analysis of chemically cross-linked protein complexes.J. Proteome Res. 2015; 14: 2190-2198Crossref PubMed Scopus (109) Google Scholar), XiSearch (15Fischer L. Rappsilber J. Quirks of error estimation in cross-linking/mass spectrometry.Anal. Chem. 2017; 89: 3829-3833Crossref PubMed Scopus (72) Google Scholar), and pLink2 (13Chen Z.-L. Meng J.-M. Cao Y. Yin J.-L. Fang R.-Q. Fan S.-B. Liu C. Zeng W.-F. Ding Y.-H. Tan D. Wu L. Zhou W.-J. Chi H. Sun R.-X. Dong M.-Q. He S.-M. A high-speed search engine pLink 2 with systematic evaluation for proteome-scale identification of cross-linked peptides.Nat. Commun. 2019; 103404 Crossref PubMed Scopus (122) Google Scholar) start with a linear peptide search using an open-modification search strategy. Kojak uses a few hundred of the top-scoring peptides and combines them into pairs fitting the precursor mass, whereas XiSearch and pLink2 only keep a certain number of these and search the entire database again for the second peptide. The existing algorithms constrain the search space for their full scoring. That means they do not apply their final, most discriminative score to every candidate cross-link within the precursor tolerance window. This might prematurely dismiss some candidate peptides that would have a high score as a peptide pair and reduce sensitivity in favor of efficiency. It was previously shown that one of the two peptides of a correctly identified cross-link might not be found within the first few hundred or even thousand peptides by pre-scoring linear peptides (16Dai J. Jiang W. Yu F. Yu W. Xolik: finding cross-linked peptides with maximum paired scores in linear time.Bioinformatics. 2019; 35: 251-257Crossref PubMed Scopus (6) Google Scholar). Our own experiments have also shown that it is not rare to find thousands of peptide pairs with at least 3 matched fragments for each peptide for one fragment spectrum and a middle-sized database of fewer than 500 proteins (data not shown). Sensitivity is defined as the proportion of real cross-links in a data set identified by a search tool. Unfortunately, it is difficult to calculate the true number of real cross-links in a data set, because the crystal structures are often incomplete, especially for the larger complexes. The theoretical number of possible cross-links for most protein complexes is very high and only a small fraction of them is usually identified. Also, this number is the same for any fixed sample or searched database and does not affect the comparison of tools. Therefore, in this study we use the number of reported cross-links from the target protein database given a fixed FDR threshold as a substitute for the real sensitivity of a search. In this work we introduce OpenPepXL, an efficient open-source software for identification of cross-linked peptides in fragment mass spectra. It is based on a full exploration of all possible candidate cross-link peptide pairs for each precursor mass in order to achieve high sensitivity, but because of efficient index data structures and search algorithms, it can achieve much improved runtimes. OpenPepXL supports both labeled and label-free, mono- and heterobifunctional noncleavable cross-linkers. It is based on the OpenMS software framework (17Rost H.L. Sachsenberg T. Aiche S. Bielow C. Weisser H. Aicheler F. Andreotti S. Ehrlich H.-C. Gutenbrunner P. Kenar E. Liang X. Nahnsen S. Nilse L. Pfeuffer J. Rosenberger G. Rurik M. Schmitt U. Veit J. Walzer M. Wojnar D. Wolski W.E. Schilling O. Choudhary J.S. Malmström L. Aebersold R. Reinert K. Kohlbacher O. OpenMS: a flexible open-source software platform for mass spectrometry data analysis.Nat. Methods. 2016; 13: 741-748Crossref PubMed Scopus (290) Google Scholar) and makes use of multi-core architectures using the OpenMP API. OpenPepXL is part of The OpenMS Proteomics Pipeline (TOPP) that includes tools for labeled and label-free quantification, pre- and post-processing, and visualization of spectra and identification data. It can be installed on all major operating systems (Windows, macOS, and Linux) and is compatible with most computing clusters and cloud services for large-scale data analysis. It can be run as a command-line tool with a preconfigured file containing the settings, or as part of a workflow built using the graphical user interface of the free to use KNIME Analytics Platform (18Berthold M.R. KNIME: The Konstanz Information Miner.in: Data Analysis, Machine Learning and Applications. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg2008Crossref Scopus (633) Google Scholar). OpenPepXL supports several output formats for XL-MS identification data such as the MzIdentML 1.2 format (19Vizcaino J.A. Mayer G. Perkins S. Barsnes H. Vaudel M. Perez-Riverol Y. Ternent T. Uszkoreit J. Eisenacher M. Fischer L. Rappsilber J. Netz E. Walzer M. Kohlbacher O. Leitner A. Chalkley R.J. Ghali F. Martínez-Bartolomé S. Deutsch E.W. Jones A.R. The mzIdentML data standard version 1.2, supporting advances in proteome informatics.Mol. Cell. Proteomics. 2017; 16: 1275-1285Abstract Full Text Full Text PDF PubMed Scopus (35) Google Scholar), the xQuest XML output format and simple text-based tabular formats. The output can, therefore, be easily integrated into many existing XL-MS data analysis pipelines and is also compatible with the public repository PRIDE (20Riverol Y. Csordas A. Bai J. Bernal-Llinares M. Hewapathirana S. Kundu D.J. Inuganti A. Griss J. Mayer G. Eisenacher M. Pérez E. Uszkoreit J. Pfeuffer J. Sachsenberg T. Yilmaz S. Tiwary S. Cox J. Audain E. Walzer M. Jarnuczak A.F. Ternent T. Brazma A. Vizcaíno J.A. The PRIDE database and related tools and resources in 2019: improving support for quantification data.Nucleic Acids Res. 2019; 47: D442-D450Crossref PubMed Scopus (3472) Google Scholar) which is part of ProteomeXchange (21Deutsch E.W. Csordas A. Sun Z. Jarnuczak A. Perez-Riverol Y. Ternent T. Campbell D.S. Bernal-Llinares M. Okuda S. Kawano S. Moritz R.L. Carver J.J. Wang M. Ishihama Y. Bandeira N. Hermjakob H. Vizcaíno J.A. The ProteomeXchange consortium in 2017: supporting the cultural change in proteomics public data deposition.Nucleic Acids Res. 2017; 45 (gkw936): D1100-D1106Crossref PubMed Scopus (511) Google Scholar). We compare OpenPepXL to other commonly used tools for identification of noncleavable cross-linkers (pLink2 (13Chen Z.-L. Meng J.-M. Cao Y. Yin J.-L. Fang R.-Q. Fan S.-B. Liu C. Zeng W.-F. Ding Y.-H. Tan D. Wu L. Zhou W.-J. Chi H. Sun R.-X. Dong M.-Q. He S.-M. A high-speed search engine pLink 2 with systematic evaluation for proteome-scale identification of cross-linked peptides.Nat. Commun. 2019; 103404 Crossref PubMed Scopus (122) Google Scholar), XiSearch (15Fischer L. Rappsilber J. Quirks of error estimation in cross-linking/mass spectrometry.Anal. Chem. 2017; 89: 3829-3833Crossref PubMed Scopus (72) Google Scholar), Kojak (14Hoopmann M.R. Zelter A. Johnson R.S. Riffle M. MacCoss M.J. Davis T.N. Moritz R.L. Kojak: efficient analysis of chemically cross-linked protein complexes.J. Proteome Res. 2015; 14: 2190-2198Crossref PubMed Scopus (109) Google Scholar), StavroX (8Gotze M. Pettelkau J. Schaks S. Bosse K. Ihling C.H. Krauth F. Fritzsche R. Kühn U. Sinz A. StavroX—a software for analyzing crosslinked products in protein interaction studies.J. Am. Soc. Mass Spectrom. 2012; 23: 76-87Crossref PubMed Scopus (235) Google Scholar), and xQuest (9Rinner O. Seebacher J. Walzthoeni T. Mueller L.N. Beck M. Schmidt A. Mueller M. Aebersold R. Identification of cross-linked peptides from large sequence databases.Nat. Methods. 2008; 5: 315-318Crossref PubMed Scopus (4) Google Scholar)) on a diverse set of XL-MS experiments and show that it tends to be more sensitive while still achieving very good runtimes. OpenPepXL is available under a three-clause BSD license at https://www.openms.de/openpepxl/. OpenPepXL belongs to the category of algorithms that score an entire candidate molecule of two peptides covalently linked with a cross-linker against an experimental spectrum without doing an open-modification search for linear peptides first. In this sense it has more in common with xQuest (9Rinner O. Seebacher J. Walzthoeni T. Mueller L.N. Beck M. Schmidt A. Mueller M. Aebersold R. Identification of cross-linked peptides from large sequence databases.Nat. Methods. 2008; 5: 315-318Crossref PubMed Scopus (4) Google Scholar) and StavroX (8Gotze M. Pettelkau J. Schaks S. Bosse K. Ihling C.H. Krauth F. Fritzsche R. Kühn U. Sinz A. StavroX—a software for analyzing crosslinked products in protein interaction studies.J. Am. Soc. Mass Spectrom. 2012; 23: 76-87Crossref PubMed Scopus (235) Google Scholar) than with pLink2 (13Chen Z.-L. Meng J.-M. Cao Y. Yin J.-L. Fang R.-Q. Fan S.-B. Liu C. Zeng W.-F. Ding Y.-H. Tan D. Wu L. Zhou W.-J. Chi H. Sun R.-X. Dong M.-Q. He S.-M. A high-speed search engine pLink 2 with systematic evaluation for proteome-scale identification of cross-linked peptides.Nat. Commun. 2019; 103404 Crossref PubMed Scopus (122) Google Scholar), Kojak (14Hoopmann M.R. Zelter A. Johnson R.S. Riffle M. MacCoss M.J. Davis T.N. Moritz R.L. Kojak: efficient analysis of chemically cross-linked protein complexes.J. Proteome Res. 2015; 14: 2190-2198Crossref PubMed Scopus (109) Google Scholar), or XiSearch (15Fischer L. Rappsilber J. Quirks of error estimation in cross-linking/mass spectrometry.Anal. Chem. 2017; 89: 3829-3833Crossref PubMed Scopus (72) Google Scholar). OpenPepXL keeps a list of all linear peptides with modifications and their masses after in silico digestion of the protein database. The candidate peptide pairs are then enumerated for each MS2 spectrum precursor mass (Fig. 1). This way only the necessary pairs are created. By using the indices of the linear peptide table to reference the peptides in a pair, only a minimal amount of additional memory is required for this candidate peptide pair enumeration. Loop-links and mono-links are also considered in this step. Then theoretical spectra containing all linear and cross-linked fragments expected from the peptide pair are generated. By default, b- and y-ion series including neutral losses of NH3 and H2O are considered, but a-, c-, x- and z-ions can also be generated to accommodate different fragmentation methods. A spectrum matching algorithm matches peaks between these theoretical and the experimental spectra. From the number of matched peaks the match-odds score for a candidate peptide pair is calculated (more on the score below). For experiments using labeled cross-linkers a few additional pre-processing steps are necessary. To pair MS2 spectra of the same peptide pairs linked by light and heavy isotope labeled cross-linkers the MS1 features across mass traces and retention time have to be detected and paired. We use the OpenMS tool for MS1 labeling (FeatureFinderMultiplex) to detect pairs of MS1 features from light and heavy cross-links based on the characteristic mass shift. OpenPepXL then maps MS2 spectrum precursors to their respective features. MS2 spectra mapped to feature pairs are then paired up and processed (Fig. 2) to get peak sets from linear and cross-linked fragments with reduced noise. When matching theoretical spectra against these peak sets, only linear theoretical fragment peaks are matched against the experimental linear peaks and vice versa. This preprocessing step is derived from the xQuest algorithm and focuses the matching and scoring to smaller sets of peaks to reduce the chance of false-positive peak matches. The scores of the linear and cross-linked ion matches are combined to one score before the ranking and filtering of candidates. The match-odds score used in OpenPepXL is based on the score of the same name from the xQuest algorithm (9Rinner O. Seebacher J. Walzthoeni T. Mueller L.N. Beck M. Schmidt A. Mueller M. Aebersold R. Identification of cross-linked peptides from large sequence databases.Nat. Methods. 2008; 5: 315-318Crossref PubMed Scopus (4) Google Scholar). It is based on the probability of a random match between any peak from the experimental fragment ion spectrum and any peak in the theoretical fragment ion spectrum, given the mass tolerance window tol, mass range r, the number of peaks in the theoretical fragment spectrum s and the number of considered charges for all theoretical peaks c. The probability of one random match to a fragment ion peak is calculated as: p=1−(1−2×tol12r)s/c(1) The cumulative distribution function of a binomial distribution with sample size s and probability p is used to determine the probability of getting more than k matched peaks between the experimental and theoretical fragment ion spectra by random chance: P(X>k)=∑i=k+1s(is)pi(1−p)s−i(2) This probability will decrease toward 0 for higher numbers of k where a smaller probability denotes a better match, because it is less likely to have happened by chance. With the -log() function the probability is turned into a score with higher numbers denoting a better match: m=−log(P(X>k))(3) We call this the match-odds score m and it is combined with the precursor error pe (difference between theoretical and experimental precursor mass in ppm) in the following formula to get the final OpenPepXL score: score=0.2*log(10−7+m)−0.03*|pe|(4) This formula was determined by an agreement between a linear regression and a linear discriminant analysis done to find the best linear combination to separate target from decoy hits on several XL-MS data sets (refer to supplemental Methods for more details). The trimeric complex of human CRM1, SNP1 and Ran carrying a Q69L mutation was cross-linked with bis(sulfosuccinimidyl)suberate (BS3) and injected into an EASY-nLC 1000 HPLC system coupled to a Q Exactive mass spectrometer (Thermo Fisher Scientific) in duplicates under three normalized collision energy (NCE) conditions using a 50-min method. MS1 and MS2 resolution were set to 70,000 and 17,500, respectively. Fifteen most abundant precursors with charge of 3-7 were selected for MS2 fragmentation at NCE 20, 24 or 28% (refer to the supplemental Methods for more details on experimental procedure). For the protein database only the three UniProt sequences O14980, O95149 and P62826 were used. They were manually modified to reflect the modifications made during the protein expression and purification (22Monecke T. Güttler T. Neumann P. Dickmanns A. Görlich D. Ficner R. Crystal structure of the nuclear export receptor CRM1 in complex with Snurportin1 and RanGTP.Science. 2009; 324: 1087-1091Crossref PubMed Scopus (165) Google Scholar). The MS proteomics data including the modified protein sequences have been deposited to the ProteomeXchange Consortium (21Deutsch E.W. Csordas A. Sun Z. Jarnuczak A. Perez-Riverol Y. Ternent T. Campbell D.S. Bernal-Llinares M. Okuda S. Kawano S. Moritz R.L. Carver J.J. Wang M. Ishihama Y. Bandeira N. Hermjakob H. Vizcaíno J.A. The ProteomeXchange consortium in 2017: supporting the cultural change in proteomics public data deposition.Nucleic Acids Res. 2017; 45 (gkw936): D1100-D1106Crossref PubMed Scopus (511) Google Scholar) via the PRIDE (20Riverol Y. Csordas A. Bai J. Bernal-Llinares M. Hewapathirana S. Kundu D.J. Inuganti A. Griss J. Mayer G. Eisenacher M. Pérez E. Uszkoreit J. Pfeuffer J. Sachsenberg T. Yilmaz S. Tiwary S. Cox J. Audain E. Walzer M. Jarnuczak A.F. Ternent T. Brazma A. Vizcaíno J.A. The PRIDE database and related tools and resources in 2019: improving support for quantification data.Nucleic Acids Res. 2019; 47: D442-D450Crossref PubMed Scopus (3472) Google Scholar) partner repository with the data set identifier PXD014359. In addition to the CRM complex data set described above, three data sets were downloaded from public repositories or kindly provided to us by other laboratories. We chose a more complex publicly available data set derived from a BS3-cross-linked crude ribosomal fraction obtained by size exclusion chromatography of HEK293 cell lysate (ProteomeXchange ID PXD006131) (23Kolbowski L. Mendes M.L. Rappsilber J. Optimizing the parameters governing the fragmentation of cross-linked peptides in a tribrid mass spectrometer.Anal. Chem. 2017; 89: 5311-5318Crossref PubMed Scopus (41) Google Scholar). The resulting sample was a complex mixture of more than 1700 proteins, which were quantified by label-free quantification of linear peptides. With this data set several protein databases were provided. Starting from one containing the 32 most abundant proteins and doubling in size up to the 512 most abundant proteins. We searched the HCD fragmented subset of this data set consisting of about 170.000 HCD fragmented MS2 spectra against a database of the 128 most abundant proteins and 128 reversed sequence decoys. Additionally, we analyzed a data set with labeled DSS-d0/d12 and PDH-d0/d10 (pimelic acid dihydrazide) cross-linkers. Commercial Bovine Serum Albumin (BSA; Sigma-Aldrich) was cross-linked with labeled DSS or PDH cross-linker in separate experiments. Both samples were independently analyzed using HCD fragmentation and high-resolution MS/MS detection (Orbitrap Fusion Lumos) or ion trap CID fragmentation with low-resolution MS/MS detection (Orbitrap Elite). This data set was published previously as part of a larger study (24Iacobucci C. Piotrowski C. Aebersold R. Amaral B.C. Andrews P. Bernfur K. Borchers C. Brodie N.I. Bruce J.E. Cao Y. Chaignepain S. Chavez J.D. Claverol S. Cox J. Davis T. Degliesposti G. Dong M.-Q. Edinger N. Emanuelsson C. Gay M. Götze M. Gomes-Neto F. Gozzo F.C. Gutierrez C. Haupt C. Heck A.J.R. Herzog F. Huang L. Hoopmann M.R. Kalisman N. Klykov O. Kukačka Z. Liu F. MacCoss M.J. Mechtle

Ver no editor

Altmetric

PlumX

Entrar

Lembrar minha senha

Receber meu e-mail de confirmação

OpenPepXL: An Open-Source Tool for Sensitive Identification of Cross-Linked Peptides in XL-MS