Artigo Acesso aberto Revisado por pares

Structural Analysis of Multiprotein Complexes by Cross-linking, Mass Spectrometry, and Database Searching

2007; Elsevier BV; Volume: 6; Issue: 12 Linguagem: Inglês

10.1074/mcp.m700274-mcp200

ISSN

1535-9484

Autores

Alessio Maiolica, Davide Cittaro, Dario Borsotti, Lau Sennels, Claudio Ciferri, Cataldo Tarricone, Andrea Musacchio, Juri Rappsilber,

Tópico(s)

Mass Spectrometry Techniques and Applications

Resumo

Most protein complexes are inaccessible to high resolution structural analysis. We report the results of a combined approach of cross-linking, mass spectrometry, and bioinformatics to two human complexes containing large coiled-coil segments, the NDEL1 homodimer and the NDC80 heterotetramer. An important limitation of the cross-linking approach, so far, was the identification of cross-linked peptides from fragmentation spectra. Our novel approach overcomes the data analysis bottleneck of cross-linking and mass spectrometry. We constructed a purpose-built database to match spectra with cross-linked peptides, define a score that expresses the quality of our identification, and estimate false positive rates. We show that our analysis sheds light on critical structural parameters such as the directionality of the homodimeric coiled coil of NDEL1, the register of the heterodimeric coiled coils of the NDC80 complex, and the organization of a tetramerization region in the NDC80 complex. Our approach is especially useful to address complexes that are difficult in addressing by standard structural methods. Most protein complexes are inaccessible to high resolution structural analysis. We report the results of a combined approach of cross-linking, mass spectrometry, and bioinformatics to two human complexes containing large coiled-coil segments, the NDEL1 homodimer and the NDC80 heterotetramer. An important limitation of the cross-linking approach, so far, was the identification of cross-linked peptides from fragmentation spectra. Our novel approach overcomes the data analysis bottleneck of cross-linking and mass spectrometry. We constructed a purpose-built database to match spectra with cross-linked peptides, define a score that expresses the quality of our identification, and estimate false positive rates. We show that our analysis sheds light on critical structural parameters such as the directionality of the homodimeric coiled coil of NDEL1, the register of the heterodimeric coiled coils of the NDC80 complex, and the organization of a tetramerization region in the NDC80 complex. Our approach is especially useful to address complexes that are difficult in addressing by standard structural methods. Mass spectrometry-based proteomics is a powerful tool for the analysis of multiprotein complexes (1Aebersold R. Mann M. Mass spectrometry-based proteomics.Nature. 2003; 422: 198-207Crossref PubMed Scopus (5639) Google Scholar ). Thousands of complexes have been isolated, and their protein compositions have been determined (2Gavin A.C. Bosche M. Krause R. Grandi P. Marzioch M. Bauer A. Schultz J. Rick J.M. Michon A.M. Cruciat C.M. Remor M. Hofert C. Schelder M. Brajenovic M. Ruffner H. Merino A. Klein K. Hudak M. Dickson D. Rudi T. Gnau V. Bauch A. Bastuck S. Huhse B. Leutwein C. Heurtier M.A. Copley R.R. Edelmann A. Querfurth E. Rybin V. Drewes G. Raida M. Bouwmeester T. Bork P. Seraphin B. Kuster B. Neubauer G. Superti-Furga G. Functional organization of the yeast proteome by systematic analysis of protein complexes.Nature. 2002; 415: 141-147Crossref PubMed Scopus (4010) Google Scholar, 3Ho Y. Gruhler A. Heilbut A. Bader G.D. Moore L. Adams S.L. Millar A. Taylor P. Bennett K. Boutilier K. Yang L. Wolting C. Donaldson I. Schandorff S. Shewnarane J. Vo M. Taggart J. Goudreault M. Muskat B. Alfarano C. Dewar D. Lin Z. Michalickova K. Willems A.R. Sassi H. Nielsen P.A. Rasmussen K.J. Andersen J.R. Johansen L.E. Hansen L.H. Jespersen H. Podtelejnikov A. Nielsen E. Crawford J. Poulsen V. Sorensen B.D. Matthiesen J. Hendrickson R.C. Gleeson F. Pawson T. Moran M.F. Durocher D. Mann M. Hogue C.W. Figeys D. Tyers M. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry.Nature. 2002; 415: 180-183Crossref PubMed Scopus (3086) Google Scholar, 4Gavin A.C. Aloy P. Grandi P. Krause R. Boesche M. Marzioch M. Rau C. Jensen L.J. Bastuck S. Dumpelfeld B. Edelmann A. Heurtier M.A. Hoffman V. Hoefert C. Klein K. Hudak M. Michon A.M. Schelder M. Schirle M. Remor M. Rudi T. Hooper S. Bauer A. Bouwmeester T. Casari G. Drewes G. Neubauer G. Rick J.M. Kuster B. Bork P. Russell R.B. Superti-Furga G. Proteome survey reveals modularity of the yeast cell machinery.Nature. 2006; 440: 631-636Crossref PubMed Scopus (2133) Google Scholar). Although many complexes will feed into large scale crystallization trials, only a few are likely to reveal their structure. Many protein complexes are heterogeneous, insoluble at the concentrations needed for crystallization, or yield crystals lacking the quality needed for structure determination. When structures are obtained they often comprise only parts of the proteins because difficult areas have been removed to increase solubility or crystallization properties. Cross-linking in conjunction with mass spectrometry is a very promising tool to yield structural information on proteins and protein complexes that is difficult to address using standard structural methods (5Sinz A. Chemical cross-linking and mass spectrometry to map three-dimensional protein structures and protein-protein interactions.Mass Spectrom. Rev. 2006; 25: 663-682Crossref PubMed Scopus (526) Google Scholar). Just as mass spectra can reveal the identity of the protein components of a complex, if the complex or protein has been cross-linked mass spectra can be used to identify direct proximity of proteins in a complex (6Rappsilber J. Siniossoglou S. Hurt E.C. Mann M. A generic strategy to analyze the spatial organization of multi-protein complexes by cross-linking and mass spectrometry.Anal. Chem. 2000; 72: 267-275Crossref PubMed Scopus (171) Google Scholar) and aid fold recognition of proteins (7Young M.M. Tang N. Hempel J.C. Oshiro C.M. Taylor E.W. Kuntz I.D. Gibson B.W. Dollinger G. High throughput protein fold identification by using experimental constraints derived from intramolecular cross-links and mass spectrometry.Proc. Natl. Acad. Sci. U. S. A. 2000; 97: 5802-5806Crossref PubMed Scopus (392) Google Scholar). Although this has been shown in proof of principle (6Rappsilber J. Siniossoglou S. Hurt E.C. Mann M. A generic strategy to analyze the spatial organization of multi-protein complexes by cross-linking and mass spectrometry.Anal. Chem. 2000; 72: 267-275Crossref PubMed Scopus (171) Google Scholar, 7Young M.M. Tang N. Hempel J.C. Oshiro C.M. Taylor E.W. Kuntz I.D. Gibson B.W. Dollinger G. High throughput protein fold identification by using experimental constraints derived from intramolecular cross-links and mass spectrometry.Proc. Natl. Acad. Sci. U. S. A. 2000; 97: 5802-5806Crossref PubMed Scopus (392) Google Scholar), general application has yet to be achieved. Success of mass spectrometry in identifying proteins is largely due to the apparent simplicity and to the automation of protein identification using mass spectrometric data. Three features are central to the automation. (a) Based on an observed peptide mass a list of candidate peptides can be extracted from protein databases. (b) The candidate peptides can be evaluated by assessing their match to the fragmentation spectrum, resulting in a single number, the score. (c) The rate of false identifications can be estimated by computing the likelihood of a random hit. Unfortunately this straightforward automation procedure could so far not be applied to cross-linked peptides. In the absence of automatic tools similar to those used for normal peptide identification, cross-linking cannot be used routinely for structural analysis of multiprotein complexes. Indeed work based on identifying cross-linked peptides has so far been limited to complexes composed of not more than two different proteins (5Sinz A. Chemical cross-linking and mass spectrometry to map three-dimensional protein structures and protein-protein interactions.Mass Spectrom. Rev. 2006; 25: 663-682Crossref PubMed Scopus (526) Google Scholar). Standard database search tools cannot create a list of candidate cross-linked peptides based on the observed mass. A number of dedicated programs consider pairs of peptides contributing together to the observed mass (7Young M.M. Tang N. Hempel J.C. Oshiro C.M. Taylor E.W. Kuntz I.D. Gibson B.W. Dollinger G. High throughput protein fold identification by using experimental constraints derived from intramolecular cross-links and mass spectrometry.Proc. Natl. Acad. Sci. U. S. A. 2000; 97: 5802-5806Crossref PubMed Scopus (392) Google Scholar, 8Chen T. Jaffe J.D. Church G.M. Algorithms for identifying protein cross-links via tandem mass spectrometry.J. Comput. Biol. 2001; 8: 571-583Crossref PubMed Scopus (33) Google Scholar, 9Back J.W. Sanz M.A. De Jong L. De Koning L.J. Nijtmans L.G. De Koster C.G. Grivell L.A. Van Der Spek H. Muijsers A.O. A structure for the yeast prohibitin complex: structure prediction and evidence from chemical crosslinking and mass spectrometry.Protein Sci. 2002; 11: 2471-2478Crossref PubMed Scopus (145) Google Scholar, 10Taverner T. Hall N.E. O'Hair R.A. Simpson R.J. Characterization of an antagonist interleukin-6 dimer by stable isotope labeling, cross-linking, and mass spectrometry.J. Biol. Chem. 2002; 277: 46487-46492Abstract Full Text Full Text PDF PubMed Scopus (100) Google Scholar, 11Collins C.J. Schilling B. Young M. Dollinger G. Guy R.K. Isotopically labeled crosslinking reagents: resolution of mass degeneracy in the identification of crosslinked peptides.Bioorg. Med. Chem. Lett. 2003; 13: 4023-4026Crossref PubMed Scopus (49) Google Scholar, 12Kruppa G.H. Schoeniger J. Young M.M. A top down approach to protein structural studies using chemical cross-linking and Fourier transform mass spectrometry.Rapid Commun. Mass Spectrom. 2003; 17: 155-162Crossref PubMed Scopus (114) Google Scholar, 13Schilling B. Row R.H. Gibson B.W. Guo X. Young M.M. MS2Assign, automated assignment and nomenclature of tandem mass spectra of chemically crosslinked peptides.J. Am. Soc. Mass Spectrom. 2003; 14: 834-850Crossref PubMed Scopus (234) Google Scholar, 14de Koning L.J. Kasper P.T. Back J.W. Nessen M.A. Vanrobaeys F. Van Beeumen J. Gherardi E. de Koster C.G. de Jong L. Computer-assisted mass spectrometric analysis of naturally occurring and artificially introduced cross-links in proteins and protein complexes.FEBS J. 2006; 273: 281-291Crossref PubMed Scopus (55) Google Scholar, 15Seebacher J. Mallick P. Zhang N. Eddes J.S. Aebersold R. Gelb M.H. Protein cross-linking analysis using mass spectrometry, isotope-coded cross-linkers, and integrated computational data processing.J. Proteome Res. 2006; 5: 2270-2282Crossref PubMed Scopus (103) Google Scholar, 16Gao Q. Xue S. Doneanu C.E. Shaffer S.A. Goodlett D.R. Nelson S.D. Pro-CrossLink. Software tool for protein cross-linking and mass spectrometry.Anal. Chem. 2006; 78: 2145-2149Crossref PubMed Scopus (54) Google Scholar, 17Anderson G.A. Tolic N. Tang X. Zheng C. Bruce J.E. Informatics strategies for large-scale novel cross-linking analysis.J. Proteome Res. 2007; 6: 3412-3421Crossref PubMed Scopus (48) Google Scholar). The candidates then need to be validated on the basis of their match to fragmentation spectra. Currently this requires screening the spectra either completely manually or through software assistance (13Schilling B. Row R.H. Gibson B.W. Guo X. Young M.M. MS2Assign, automated assignment and nomenclature of tandem mass spectra of chemically crosslinked peptides.J. Am. Soc. Mass Spectrom. 2003; 14: 834-850Crossref PubMed Scopus (234) Google Scholar, 15Seebacher J. Mallick P. Zhang N. Eddes J.S. Aebersold R. Gelb M.H. Protein cross-linking analysis using mass spectrometry, isotope-coded cross-linkers, and integrated computational data processing.J. Proteome Res. 2006; 5: 2270-2282Crossref PubMed Scopus (103) Google Scholar, 16Gao Q. Xue S. Doneanu C.E. Shaffer S.A. Goodlett D.R. Nelson S.D. Pro-CrossLink. Software tool for protein cross-linking and mass spectrometry.Anal. Chem. 2006; 78: 2145-2149Crossref PubMed Scopus (54) Google Scholar, 18Petrotchenko E.V. Olkhovik V.K. Borchers C.H. Isotopically coded cleavable cross-linker for studying protein-protein interaction and protein complexes.Mol. Cell. Proteomics. 2005; 4: 1167-1179Abstract Full Text Full Text PDF PubMed Scopus (93) Google Scholar). No scoring system or algorithm has yet been developed to replace human intervention. Ideally such a scoring algorithm would objectively sort the false from the true matches and add a measure of confidence to the results. Here we present an algorithm that automatically finds and validates cross-linked peptides using fragmentation spectra, thereby overcoming the key limitation in the analysis of protein cross-links. Proteins are cross-linked using a 1:1 mixture of stable isotope-labeled and non-labeled cross-linker to reduce false positive rates of the process. Proteins are digested using trypsin, and peptides are analyzed by LC-MS/MS prior to data analysis with our algorithm (see Fig. 1). The algorithm was applied to data acquired from two human coiled-coil complexes, the NDEL1-(17–174) homodimer (38 kDa) and the NDC80 heterotetramer (176 kDa). We used a standard database search tool, Mascot (19Perkins D.N. Pappin D.J. Creasy D.M. Cottrell J.S. Probability-based protein identification by searching sequence databases using mass spectrometry data.Electrophoresis. 1999; 20: 3551-3567Crossref PubMed Scopus (6814) Google Scholar), and a purpose-built cross-link database (XDB) 1The abbreviations used are: XDB, cross-link database; BS2G, bis(sulfosuccinimidyl)glutarate; SIM, selected ion monitoring. that contains cross-linked peptides represented as single linear peptides. Our algorithm assigns a score and describes the confidence of each match through comparison with negative controls. The NDC80 complex was purified according to our published procedure (20Ciferri C. De Luca J. Monzani S. Ferrari K.J. Ristic D. Wyman C. Stark H. Kilmartin J. Salmon E.D. Musacchio A. Architecture of the human Ndc80-Hec1 complex, a critical constituent of the outer kinetochore.J. Biol. Chem. 2005; 280: 29088-29095Abstract Full Text Full Text PDF PubMed Scopus (147) Google Scholar). NDEL1-(17–174) was purified as described earlier (21Tarricone C. Perrina F. Monzani S. Massimiliano L. Kim M.H. Derewenda Z.S. Knapp S. Tsai L.H. Musacchio A. Coupling PAF signaling to dynein regulation: structure of LIS1 in complex with PAF-acetylhydrolase.Neuron. 2004; 44: 809-821Abstract Full Text Full Text PDF PubMed Scopus (99) Google Scholar) with the following changes. Polymerase chain reaction fragments of human NDEL1 corresponding to amino acids 17–174 of the full-length protein were subcloned in the pGEX6P-1 expression vector (GE Healthcare) and expressed in Escherichia coli strain BL21(DE3). The protein was purified by glutathione affinity chromatography. The GST tag was removed using Prescission protease (GE Healthcare), and the resulting sample was further purified by size exclusion chromatography using a Superdex 200 column equilibrated with 10 mm Hepes, pH 7.5, 100 mm NaCl. Treatment of the GST fusion protein with the Prescission protease leaves a 5-residue extension at the N terminus of NDEL1-(17–174), numbered −5 to −1. Human NdEL1-(17–174) (29 μg of protein equivalent to 775 pmol) and human NDC80 complex (15 μg of protein equivalent to 86 pmol) were mixed with a 100× excess of isotope-labeled cross-linker bis(sulfosuccinimidyl)glutarate (BS2G) (Pierce) in a final volume of 150 μl of 10 mm Hepes, pH 7.5, 100 mm NaCl at room temperature. The cross-linker, a 1:1 mixture of light BS2G-d0 and heavy BS2G-d4, was freshly prepared as a 10 nmol/μl solution in DMSO. The reaction was stopped after 30 min by adding 5 μl of 1 m ammonium bicarbonate. Sample buffer was added for separation by SDS-PAGE. The samples were electrophoresed through Novex NuPAGE 1-mm 4–12% Tris-glycine gels (Invitrogen) in MOPS buffer (Invitrogen), fixed in 50% methanol, 5% acetic acid, and stained with the colloidal blue kit (Invitrogen). Bands were excised and processed following a standard trypsin digestion procedure (22Shevchenko A. Wilm M. Vorm O. Mann M. Mass spectrometric sequencing of proteins silver-stained polyacrylamide gels.Anal. Chem. 1996; 68: 850-858Crossref PubMed Scopus (7831) Google Scholar): reduction in 100 mm DTT for 30 min at room temperature, alkylation with 55 mm iodoacetamide for 30 min at room temperature in the dark, and digestion with 12.5 ng/μl trypsin (proteomics grade, Sigma) overnight at 37 °C. The supernatant was loaded onto StageTips (23Rappsilber J. Ishihama Y. Mann M. Stop and go extraction tips for matrix-assisted laser desorption/ionization, nanoelectrospray, and LC/MS sample pretreatment in proteomics.Anal. Chem. 2003; 75: 663-670Crossref PubMed Scopus (1833) Google Scholar), and peptides were eluted in 20 μl of 80% acetonitrile, 0.1% trifluoroacetic. The acetonitrile was allowed to evaporate off (Concentrator 5301, Eppendorf AG, Hamburg, Germany), and the volume of each eluate was adjusted to 5 μl with 1% trifluoroacetic acid of which 2.5 μl, i.e. half, were injected for LC-MS/MS analysis. The proteins, after digestion with trypsin, were analyzed by LC-MS/MS using an HPLC system (1100 binary nanopump, Agilent, Palo Alto, CA) coupled on line to an ion trap FTICR hybrid mass spectrometer (LTQ-FT, ThermoElectron, Bremen, Germany). C18 material (ReproSil-Pur C18-AQ 3 μm, Dr. Maisch GmbH, Ammerbuch-Entringen, Germany) was packed into a spray emitter (75-μm inner diameter, 8-μm opening, 70-mm length; New Objectives) using an air pressure pump (Proxeon Biosystems, Odense, Denmark) to prepare an analytical column with a self-assembled particle frit (24Ishihama Y. Rappsilber J. Andersen J.S. Mann M. Microcolumns with self-assembled particle frits for proteomics.J. Chromatogr. A. 2002; 979: 233-239Crossref PubMed Scopus (264) Google Scholar). Mobile phase A consisted of water, 5% acetonitrile, and 0.5% acetic acid, and mobile phase B consisted of acetonitrile and 0.5% acetic acid. The samples were loaded from an Agilent 1100 autosampler onto the column at a 700 nl/min flow rate. The gradient had a flow rate of 300 nl/min, and the percentage of buffer B varied linearly from 0 to 20% in the first 77 min and then from 20 to 80% in a further 15 min. We used a SIM method for mass acquisition (25Olsen J.V. Mann M. Improved peptide identification in proteomics by two consecutive stages of mass spectrometric fragmentation.Proc. Natl. Acad. Sci. U. S. A. 2004; 101: 13417-13422Crossref PubMed Scopus (283) Google Scholar) with one low resolution FT-MS scan (fill target, 1,000,000 ions; resolution, 25,000; maximum fill time, 2 s; mass range, m/z 300–1575). The three most intense signals (dynamic exclusion for 180 s) were selected for SIM (fill target, 500,000 ions; maximum fill time, 50 ms; window, m/z 22) in the FTICR cell and MS2/MS3 in the ion trap (normal scan; wideband activation; fill target, 10,000 ions; maximum fill time, 100 ms). Each cycle lasted ∼3 s. In principle, precursor selection for MS/MS could be directed onto doublet signals focusing the fragmentation on candidate cross-linked peptides. However, the low data quality of the usually weak signals of cross-linked peptides in a full FT-MS spectrum makes an FT-SIM scan necessary for reliable observation of both signals of a doublet. Directed selection of precursors would require the acquisition of the fragmentation spectrum to follow the SIM scan. However, the MS/MS spectrum can be recorded in the ion trap part of our LTQ-FT in parallel to the SIM scan being recorded in the FT cell. Recording both spectra in parallel, regardless of the multiplicity of the precursor, is more time-economic than recording first the SIM and then an MS/MS scan for peptides with doublet signals. The doublet information is instead used for post-acquisition data filtering. High mass accuracy FT-MS/MS would have a different time economy. Because the MS/MS spectrum has to be recorded after the MS and SIM spectra, doublet directed sequencing is highly advisable. Peaks were picked from the raw data files using DTAsupercharge (version 0.94, made available by SourceForge, Inc.) with the following settings: precursor mass deviation, m/z 0.08; smart picking for MS/MS activated; maximum search level, 8. The four lists, one for each band in the gel, contained in total the fragment information of 14,871 precursors. A peak list was then created for light precursors, and a corresponding peak list was created for heavy precursors. The apparent occurrence of isotopic doublets, which indicates the presence of the cross-linker, was used to enrich the dataset for spectra of cross-linked peptides. For the selection of doublets, all SIM scans were extracted from the raw data files by a custom program written in the ".NET"-integrated programming language C# using the XDA-api (Xcalibur Development kit, Thermo Inc.). We then extracted, for all precursors in the complete list, the m/z, charge state, and scan number. The appropriate scan was located in the SIM file, and it was determined whether the precursor had a partner signal with intensity 0.4–2.5× at plus or minus 4.025 ± 0.01 Da. The partner intensity threshold was imposed to take into account that peptides containing the non-deuterated and deuterated cross-linker show shifted elution profiles due to the difference in isotope composition of the two species. This shift can result in partner peak intensity ratios different from 1:1, depending on the timing of the SIM acquisition with respect to the elution profiles of the peptides. The threshold values were determined by inspecting SIM scans of peptides that we identified as being modified with the hydrolyzed cross-linker. If there was a matching signal above the precursor m/z, the precursor was taken as a candidate peptide containing the light form of the cross-linker, and the MS/MS peak list of the precursor was added to the peak list of light precursors. Equivalently if a matching signal was found below the precursor m/z, the precursor was taken as a candidate peptide containing the heavy form of the cross-linker, and the MS/MS peak list of the precursor was added to the peak list of heavy precursors. This process resulted in a total of 1452 queries, i.e. 10% of the acquired data. The complete peak list for each band from the gel was searched against Swiss-Prot (www.expasy.org/sprot), to which the exact sequences of the recombinant proteins under investigation were added, using Mascot (version 2.0) with the following parameters: monoisotopic mass values; peptide tolerance, 0.08 Da; MS/MS tolerance, 0.5 Da; instrument, ESI-TRAP; fully tryptic specificity; cysteine carbamidomethylation as fixed modification; oxidation on methionine and hydrolyzed cross-linker on protein N terminus, lysine, serine, and tyrosine as variable modifications; two missed cleavage sites allowed. The results of this first search were used in three ways. First, the peptides matching the protein complex members were used to estimate the mass accuracy of the analysis. We took as the mass accuracy the mass deviation that included 97% of the identified peptides (588 peptides). This value was ∼4.5 ppm for all four sets of data. For comparison, the average deviation was 1.3 ppm. Second, we could see the extent of side reaction of the amine-specific cross-linker with serine and tyrosine. We detected only a very small number of serine-containing peptides being modified, and there was no indication of tyrosine modification. Therefore, we did not further consider these modifications in our analysis. Third, the identified proteins were selected for the construction of the XDB. Although we worked with a purified complex, in addition to the four expected proteins several other proteins were present in the sample as judged from the gel; presumably they were contaminants from the expression system. One approach would be to identify all proteins present and to include them in the XDB. This would, however, unnecessarily inflate XDB as we are only interested in those proteins actually found in the respective fraction we analyze. By searching Swiss-Prot for each analysis we ensure that we consider exactly those proteins that can be detected in the respective gel band. To also identify cross-linked peptides using a standard database search tool like Mascot required a special database and a separate search. The selected proteins were digested in silico allowing for up to two missed cleavages. The obtained peptides were filtered, to contain either an internal Lys or the protein N terminus, and joined up in all possible pairwise combinations. It is essential to have both linear permutations of a peptide pair, i.e. AB and BA, to allow the complete matching of fragments (see also Fig. 1 and "Results"). Creating one entry per peptide pair has the disadvantage of resulting in many short entries and occupying more memory than combining the peptides in linear succession. If a protein P gives a peptide set [a, b, c] and a protein Q gives a peptide set [A, B, C], then XDB would contain a single protein in which the peptides of the proteins P and Q were concatenated in a single sequence as caabacbbccCAABACBBCCcAaAbAcBaBbBcCaCbC. The search program will create from this sequence all possible pairs in both permutations: aa, ab, ba, bb, bc, cb, cc, ca, ac, etc. This way of constructing a cross-link database is a very condensed way of writing all possible pairwise combinations, in our example leading to 30 letters instead of 54, i.e. resulting in almost 50% compaction. Note that the peptides concatenated in XDB contain missed cleavage sites. The search will also create chimeric peptides containing parts of the two original peptides. These are known false positives. The reversed cross-link database was obtained by inverting the entire cross-link database, i.e. writing the sequence from C terminus to N terminus. The cross-link database was searched with the peak lists of light and heavy precursors using Mascot with the parameters: monoisotopic mass values; peptide tolerance, 0.08 Da; MS/MS tolerance, 0.5 Da; instrument, ESI-TRAP; fully tryptic specificity; cysteine carbamidomethylation as fixed modification; light or heavy cross-linker hydrolyzed and oxidation on methionine as variable modifications; five missed cleavage sites allowed. For the second control using a wrong mass for the cross-linker, 3 Da were added to the correct masses of the heavy and light cross-linker in the modification file of Mascot. The database search retrieves the peptides that match the observed mass. Mascot does not return all candidates but only those considered non-random based on an initial matching of fragments. 2J. Cottrell, personal communication. Mascot already uses some fragment information to select higher value candidates than obtained on the basis of the measured peptide mass alone. The cross-linked peptides can be found as miscleaved peptides in the output of the database search. The score used for expressing the quality of match between a spectrum and a cross-linked peptide is presented under "Results." When calculating our score, we considered all b- and y-ions of the cross-linked peptide. Other ions such as those resulting from loss of water or ammonia, internal fragments, and multiply charged fragments are observable (26Gaucher S.P. Hadi M.Z. Young M.M. Influence of crosslinker identity and position on gas-phase dissociation of Lys-Lys crosslinked peptides.J. Am. Soc. Mass Spectrom. 2006; 17: 395-405Crossref PubMed Scopus (27) Google Scholar) but not currently included in the algorithm. Considering all possible fragments results in a large number of mass values and lowers the selectivity of the score at our current, low mass accuracy (±0.5 Da). For the Mascot-independent matching of precursor masses with predicted cross-linked peptides we wrote a Perl script that computes all predicted cross-linked peptides matching to precursors within 4.5 ppm deviation based on input protein sequences and number of missed cleavages allowed (two) and the amino acid required for the linkage (lysine or protein N terminus). Note that we focus on those products composed of two linked peptides and containing a single cross-linker molecule. Including other cross-link products is possible but increases the search space and consequently the background of the data analysis. Currently data of peptides containing more than one cross-linker do not contribute to the background because, not being identified as a doublet of 4-Da spacing, they are filtered out. We used a cross-linker targeting amino groups, BS2G, in a 1:1 mixture of its unlabeled light and labeled heavy form (the latter containing four deuterium atoms) (27Muller D.R. Schindler P. Towbin H. Wirth U. Voshol H. Hoving S. Steinmetz M.O. Isotope-tagged cross-linking reagents. A new tool in mass spectrometric protein interaction analysis.Anal. Chem. 2001; 73: 1927-1934Crossref PubMed Scopus (187) Google Scholar). The use of a light/heavy mixture results in doublet mass signals for those cases in which the cross-linker was incorporated between two peptides or alternatively on a single peptide. We began our analysis by selecting from the entire LC-MS/MS dataset only those fragmentation spectra of peptides with doublet signals, thus focusing our analysis on fragmentation products of cross-linked peptides (Fig. 1a). It should be noted that the doublet information serves solely for the reduction of data to focus the analysis onto likely cross-linked peptides. In this way, the false positive rate of the database search is minimized. For small datasets such as obtained for a single protein or a small complex this will not be necessary, and any cross-linker can be used. Two observations allowed the construction of a special database for the identification of cross-linked peptides. First, a cross-linked peptide has the same mass of a peptide obtained by fusing the two li

Referência(s)