Software for Peak Finding and Elemental Composition Assignment for Glycosaminoglycan Tandem Mass Spectra

Artigo Acesso aberto Revisado por pares

Software for Peak Finding and Elemental Composition Assignment for Glycosaminoglycan Tandem Mass Spectra

2018; Elsevier BV; Volume: 17; Issue: 7 Linguagem: Inglês

10.1074/mcp.ra118.000590

ISSN

1535-9484

Autores

John D. Hogan, Joshua Klein, Jiandong Wu, Pradeep Chopra, Geert‐Jan Boons, Luís Carvalho, Cheng Lin, Joseph Zaia,

Tópico(s)

Carbohydrate Chemistry and Synthesis

Resumo

Glycosaminoglycans (GAGs) covalently linked to proteoglycans (PGs) are characterized by repeating disaccharide units and variable sulfation patterns along the chain. GAG length and sulfation patterns impact disease etiology, cellular signaling, and structural support for cells. We and others have demonstrated the usefulness of tandem mass spectrometry (MS2) for assigning the structures of GAG saccharides; however, manual interpretation of tandem mass spectra is time-consuming, so computational methods must be employed. In the proteomics domain, the identification of monoisotopic peaks and charge states relies on algorithms that use averagine, or the average building block of the compound class being analyzed. Although these methods perform well for protein and peptide spectra, they perform poorly on GAG tandem mass spectra, because a single average building block does not characterize the variable sulfation of GAG disaccharide units. In addition, it is necessary to assign product ion isotope patterns to interpret the tandem mass spectra of GAG saccharides. To address these problems, we developed GAGfinder, the first tandem mass spectrum peak finding algorithm developed specifically for GAGs. We define peak finding as assigning experimental isotopic peaks directly to a given product ion composition, as opposed to deconvolution or peak picking, which are terms more accurately describing the existing methods previously mentioned. GAGfinder is a targeted, brute force approach to spectrum analysis that uses precursor composition information to generate all theoretical fragments. GAGfinder also performs peak isotope composition annotation, which is typically a subsequent step for averagine-based methods. Data are available via ProteomeXchange with identifier PXD009101. Glycosaminoglycans (GAGs) covalently linked to proteoglycans (PGs) are characterized by repeating disaccharide units and variable sulfation patterns along the chain. GAG length and sulfation patterns impact disease etiology, cellular signaling, and structural support for cells. We and others have demonstrated the usefulness of tandem mass spectrometry (MS2) for assigning the structures of GAG saccharides; however, manual interpretation of tandem mass spectra is time-consuming, so computational methods must be employed. In the proteomics domain, the identification of monoisotopic peaks and charge states relies on algorithms that use averagine, or the average building block of the compound class being analyzed. Although these methods perform well for protein and peptide spectra, they perform poorly on GAG tandem mass spectra, because a single average building block does not characterize the variable sulfation of GAG disaccharide units. In addition, it is necessary to assign product ion isotope patterns to interpret the tandem mass spectra of GAG saccharides. To address these problems, we developed GAGfinder, the first tandem mass spectrum peak finding algorithm developed specifically for GAGs. We define peak finding as assigning experimental isotopic peaks directly to a given product ion composition, as opposed to deconvolution or peak picking, which are terms more accurately describing the existing methods previously mentioned. GAGfinder is a targeted, brute force approach to spectrum analysis that uses precursor composition information to generate all theoretical fragments. GAGfinder also performs peak isotope composition annotation, which is typically a subsequent step for averagine-based methods. Data are available via ProteomeXchange with identifier PXD009101. Glycosaminoglycans (GAGs) 1The abbreviations used are: GAG, Glycosaminoglycan; AUC, Area under the curve; CS, Chondroitin sulfate; ECM, Extracellular matrix; EDD, Electron detachment dissociation; HS, Heparan sulfate; KS, Keratan sulfate; MS2, Tandem mass spectrometry; NETD, Negative electron transfer dissociation; PG, Proteoglycan; ppm, Parts-per-million; S/N, Signal-to-noise ratio; TIC, Total ion current. 1The abbreviations used are: GAG, Glycosaminoglycan; AUC, Area under the curve; CS, Chondroitin sulfate; ECM, Extracellular matrix; EDD, Electron detachment dissociation; HS, Heparan sulfate; KS, Keratan sulfate; MS2, Tandem mass spectrometry; NETD, Negative electron transfer dissociation; PG, Proteoglycan; ppm, Parts-per-million; S/N, Signal-to-noise ratio; TIC, Total ion current. exist either as the glycan portion of proteoglycans (PGs) or as extracellular matrix (ECM) polysaccharides. The three classes of sulfated GAGs, heparan sulfate (HS), chondroitin sulfate (CS), and keratan sulfate (KS), are characterized by their long, linear chain, a repeating disaccharide unit (specific to each GAG class), and variable patterns of sulfation and acetylation. Because of their locations on the cell surface and in the ECM, as well as their sequence variation, they interact with many growth factors and growth factor receptors and therefore modulate cellular signaling and signal transduction pathways (1.Bishop J.R. Schuksz M. Esko J.D. Heparan sulphate proteoglycans fine-tune mammalian physiology.Nature. 2007; 446: 1030-1037Crossref PubMed Scopus (1265) Google Scholar, 2.Lindahl U. Li J.P. Interactions between heparan sulfate and proteins-design and functional implications.Int. Rev. Cell Mol. Biol. 2009; 276: 105-159Crossref PubMed Scopus (225) Google Scholar). Furthermore, spatial and temporal regulation of the structures of GAGs characterizes physiology and pathophysiology in eukaryotes. For instance, cancer cells remodel HS chains in their microenvironments to avoid immune system targeting and allow proliferation (3.Fuster M.M. Esko J.D. The sweet and sour of cancer: glycans as novel therapeutic targets.Nat. Rev. Cancer. 2005; 5: 526-542Crossref PubMed Scopus (1059) Google Scholar). In the motor neuron-degenerative disease amyotrophic lateral sclerosis, KS sulfation has been shown to correlate with disease progression (4.Hirano K. Ohgomori T. Kobayashi K. Tanaka F. Matsumoto T. Natori T. Matsuyama Y. Uchimura K. Sakamoto K. Takeuchi H. Hirakawa A. Suzumura A. Sobue G. Ishiguro N. Imagama S. Kadomatsu K. Ablation of Keratan Sulfate Accelerates Early Phase Pathogenesis of ALS.PLoS ONE. 2013; 8: e66969Crossref PubMed Scopus (40) Google Scholar). Indeed, GAG expression is required for embryonic development (5.Perrimon N. Bernfield M. Specificities of heparan sulfate proteoglycans in developmental processes.Nature. 2000; 404: 725-728Crossref PubMed Scopus (659) Google Scholar), and GAGs are required for the proper functioning of all mammalian biological systems (1.Bishop J.R. Schuksz M. Esko J.D. Heparan sulphate proteoglycans fine-tune mammalian physiology.Nature. 2007; 446: 1030-1037Crossref PubMed Scopus (1265) Google Scholar). Clearly, assigning GAG sequences from tandem mass spectral data is necessary to establish their roles in diverse disease mechanisms. Tandem mass spectrometry (MS2) entails isolating a precursor ion in the first stage and dissociating it in subsequent stages. Manual interpretation of tandem mass spectra is tedious, time-consuming, and subjective. The first step of interpretation is to assign the m/z and charge states for product ions. Once this is done, neutral masses and isotope compositions can be assigned. Once these assignments are made, an algorithm can be used to identify the GAG sequence (7.Hu H. Huang Y. Mao Y. Yu X. Xu Y. Liu J. Zong C. Boons G. Lin C. Xia Y. Zaia J. A computational framework for heparan sulfate sequencing using high-resolution tandem mass spectra.Mol. Cell. Proteomics. 2014; 13: 2490-2502Abstract Full Text Full Text PDF PubMed Scopus (25) Google Scholar). Wolff and colleagues first applied electron activated dissociation methods to GAG oligosaccharides, using both electron detachment dissociation (EDD) (8.Wolff J.J. Chi L. Linhardt R.J. Amster I.J. Distinguishing glucuronic from iduronic acid in glycosaminoglycan tetrasaccharides by using electron detachment dissociation.Anal. Chem. 2007; 79: 2015-2022Crossref PubMed Scopus (130) Google Scholar) and negative electron transfer dissociation (NETD) (9.Wolff J.J. Leach III, F.E. Laremore T.N. Kaplan D. Easterling M.E. Linhardt R.J. Amster I.J. Negative electron transfer dissociation of glycosaminoglycans.Anal. Chem. 2010; 82: 3460-3466Crossref PubMed Scopus (118) Google Scholar). More recently, Huang and colleagues showed the effectiveness of electron activated dissociation for minimizing sulfate loss during HS mass spectrometry experiments (10.Huang Y. Yu X. Mao Y. Costello C.E. Zaia J. Lin C. De novo sequencing of heparan sulfate oligosaccharides by electron-activated dissociation.Anal. Chem. 2013; 85: 11979-11986Crossref PubMed Scopus (42) Google Scholar). Resulting tandem mass spectra after electron activated dissociation are extremely rich in that they contain many product ions with varying charge states and isotope patterns. In the proteomics domain, several computational methods for automatic recognition of isotopic patterns and assignment of charge states and neutral mass values have been developed, including THRASH (11.Horn D.M. Zubarev R.A. McLafferty F.W. Automated reduction and interpretation of high resolution electrospray mass spectra of large molecules.J. Am. Soc. Mass Spectrom. 2000; 11: 320-332Crossref PubMed Scopus (479) Google Scholar), Decon2LS (12.Jaitly N. Mayampurath A. Littlefield K. Adkins J.N. Anderson G.A. Smith R.D. Decon2LS: An open-source software package for automated processing and visualization of high resolution mass spectrometry data.BMC Bioinf. 2009; 10: 87Crossref PubMed Scopus (177) Google Scholar), and MS-Deconv (13.Liu X. Inbar Y. Dorrestein P.C. Wynne C. Edwards N. Souda P. Whitelegge J.P. Bafna V. Pevzner P.A. Deconvolution and Database Search of Complex Tandem Mass Spectra of Intact Proteins.Mol. Cell. Proteomics. 2010; 9: 2772-2782Abstract Full Text Full Text PDF PubMed Scopus (132) Google Scholar), among others. These methods assume product ion isotopic distributions will match the pattern produced by the molecule's average building block, or averagine; however, performance for GAG saccharide tandem mass spectra is inadequate, because of the variable levels of sulfation along their chains and the relatively abundant 34S isotope. Fig. 1 shows two examples of the large difference in the expected isotopic distributions of non-sulfated and fully sulfated GAG fragments. Plainly, there is no GAG averagine that would accurately recover the correct monoisotopic peak for each fragment, and that leads to incorrect and missing assignments. Averagine-based approaches also do not assign elemental compositions for monoisotopic ions, a step necessary for interpretation of GAG saccharide tandem mass spectra. We sought to solve these problems. Previous work in GAG tandem mass spectra analysis and annotation has typically been a step in a further sequencing project. For instance, Yu and colleagues recently sequenced the dermatan sulfate (DS) chain of the pericellular PG decorin using a genetic algorithm based on known sulfate modification information from disaccharide analysis but mentioned in-house data interpretation software in passing (14.Yu Y. Duan J. Leach III, F.E. Toida T. Higashi K. Zhang H. Zhang F. Amster I.J. Linhardt R.J. Sequencing the dermatan sulfate chain of Decorin.J. Am. Chem. Soc. 2017; 139: 16986-16995Crossref PubMed Scopus (36) Google Scholar). And two GAG sequencing efforts from Chiu and colleagues, GAG-ID (15.Chiu Y. Huang R. Orlando R. Sharp J.S. GAG-ID: Heparan sulfate (HS) and heparin glycosaminoglycan high-throughput identification software.Mol. Cell. Proteomics. 2015; 14: 1720-1730Abstract Full Text Full Text PDF PubMed Scopus (26) Google Scholar) and a multivariate mixture model to estimate identification accuracy (16.Chiu Y. Schliekelman P. Orlando R. Sharp J.S. A multivariate mixture model to estimate the accuracy of glycosaminoglycan identifications made by tandem mass spectrometry (MS/MS) and database search.Mol. Cell. Proteomics. 2017; 16: 255-264Abstract Full Text Full Text PDF PubMed Scopus (9) Google Scholar) represent recent attempts at automated GAG sequencing using a weighted hypergeometric distribution to match spectra to potential sequences. However, these papers both describe a method that only considers high intensity peaks, rather than full isotopic distributions, and their method requires an intense experimental workup for chemical derivatization that replaces sulfate groups with heavy isotope acetyl groups. Averagine-based deisotoping and charge state deconvolution algorithms were developed to circumvent the combinatorial explosion of the number of possible protein sequences as the length of the chain increases. Because of this expansion, brute force methods searching all possible proteins and protein product ions are not feasible. Although the number of possible GAGs also increases exponentially as a function of chain length, the rate of increase is much lower. Fig. 2 shows the log10 of the number of possible structures of unmodified proteins, HS GAG saccharides, CS GAG saccharides, and KS GAG saccharides, as a function of the length of the chain. Notice how the slopes for each GAG class are much smaller than the slope for proteins and consider how many more protein structures are possible when post-translational modifications are included. Given the reduced search space and the variable sulfation along GAG chains, we developed a brute force product ion search algorithm using the Python programming language, GAGfinder, for MS2 of GAG saccharides of a given composition. GAGfinder iterates through every possible fragment of a GAG composition at multiple charge states and tests its theoretical isotopic distribution against the observed spectral pattern. GAGfinder is available for download at http://www.bumc.bu.edu/msr/software. This paper describes the steps in GAGfinder and its performance as a means to identify the GAG monoisotopic product ions, charge states, and neutral mass values versus an averagine-based peak finding algorithm. A flowchart of the steps GAGfinder can be viewed in Fig. 3. The details of each step are described below. The term "product ion" will be used to refer to ions observed in tandem mass spectra. The term "fragment" will be used to refer to theoretical GAG saccharide substructures in a database. There are several required and optional inputs for GAGfinder to return accurate results. The spectrum data must be in the mzML file format (17.Deutsch E. mzML: a single, unifying data format for mass spectrometer output.Proteomics. 2008; 8: 2776-2777Crossref PubMed Scopus (137) Google Scholar); the raw data can be converted using any format conversion tool, such as MSConvert (18.Chambers M.C. et al.A cross-platform toolkit for mass spectrometry and proteomics.Nat. Biotechnol. 2012; 30: 918-920Crossref PubMed Scopus (1775) Google Scholar) or compassXport (Bruker Daltonics, Inc.). Other required inputs include the GAG class, the precursor m/z, the precursor charge, and the output format for the results. Either the top percentile or the top N results can be returned, but not both. Optional inputs include the reducing-end derivatization formula (if any), the adducted metal and the number of adducts (if there is metal adduction), the NETD cation reagent (if NETD), a user-specified internal precision for mapping fragments to isotopic distributions, a Boolean value for whether noise has already been removed from the spectrum, and the number of labile sulfate losses to consider. These inputs are arguments for the GAGfinder command line program. The first step of GAGfinder is connecting to GAGfragDB, the database developed in SQLite for easy storing and retrieval of all possible fragments of a precursor composition up to hexadecamer. There are 4150 unique compositions, 65,664 fragments, and 17,156,928 precursor-fragment mappings in GAGfragDB. The composition with the most possible fragments - (1.Bishop J.R. Schuksz M. Esko J.D. Heparan sulphate proteoglycans fine-tune mammalian physiology.Nature. 2007; 446: 1030-1037Crossref PubMed Scopus (1265) Google Scholar, 7.Hu H. Huang Y. Mao Y. Yu X. Xu Y. Liu J. Zong C. Boons G. Lin C. Xia Y. Zaia J. A computational framework for heparan sulfate sequencing using high-resolution tandem mass spectra.Mol. Cell. Proteomics. 2014; 13: 2490-2502Abstract Full Text Full Text PDF PubMed Scopus (25) Google Scholar, 8.Wolff J.J. Chi L. Linhardt R.J. Amster I.J. Distinguishing glucuronic from iduronic acid in glycosaminoglycan tetrasaccharides by using electron detachment dissociation.Anal. Chem. 2007; 79: 2015-2022Crossref PubMed Scopus (130) Google Scholar, 4.Hirano K. Ohgomori T. Kobayashi K. Tanaka F. Matsumoto T. Natori T. Matsuyama Y. Uchimura K. Sakamoto K. Takeuchi H. Hirakawa A. Suzumura A. Sobue G. Ishiguro N. Imagama S. Kadomatsu K. Ablation of Keratan Sulfate Accelerates Early Phase Pathogenesis of ALS.PLoS ONE. 2013; 8: e66969Crossref PubMed Scopus (40) Google Scholar, 15.Chiu Y. Huang R. Orlando R. Sharp J.S. GAG-ID: Heparan sulfate (HS) and heparin glycosaminoglycan high-throughput identification software.Mol. Cell. Proteomics. 2015; 14: 1720-1730Abstract Full Text Full Text PDF PubMed Scopus (26) Google Scholar) with a key of (dHexA, HexA, HexN, Ac, SO3) - has 21,299 child fragments associated with it in HS. GAGfragDB includes a controlled vocabulary designed to give each fragment a unique text identifier that does not assume anything about the structure of the precursor or the fragment. In other words, a fragment that has one composition but could be a terminal fragment or any number of internal fragments will have only one identifier. Supplemental Fig. S1 shows the relational schema for GAGfragDB. The connection to GAGfragDB is established by the Python sqlite3 module. After connecting to GAGfragDB, GAGfinder loads the mzML file into Python using the pymzML module (19.Bald T. Barth J. Niehues A. Specht M. Hippler M. Fufezan C. pymzML–Python module for high-throughput bioinformatics on mass spectrometry data.Bioinformatics. 2012; 28: 1052-1053Crossref PubMed Scopus (62) Google Scholar). The pymzML module has several spectrum processing methods, including centroiding peaks, finding peaks in the spectrum within a particular error tolerance, and a number of others. Once the tandem mass spectral data have been loaded into Python, GAGfinder normalizes and averages the scans of the data file using the total ion current (TIC). GAGfinder first divides each scan in the file by the summed TIC intensity and then calculates the average over all scans. This step prevents any of the scans from biasing the results over the rest of the scans and is performed using methods in the pymzML package. After normalizing the scans, GAGfinder removes noise from the spectrum, if the spectrum has not already been denoised by the user prior to runtime. GAGfinder uses an implementation of the noise reduction algorithm MasSPIKE (20.Kaur P. O'Connor P.B. Algorithms for Automatic Interpretation of High Resolution Mass Spectra.J. Am. Soc. Mass Spectrom. 2006; 17: 459-468Crossref PubMed Scopus (51) Google Scholar). Given the precursor m/z and charge, the neutral mass of the precursor can be calculated, and based on this and the GAG class, the precursor composition can be determined. GAGfinder considers metal adduction and reducing end derivatization information to calculate the neutral mass matching the composition in GAGfragDB. GAGfinder selects the composition with the neutral mass closest to the calculated precursor mass as the precursor composition. In order to reduce the search space as much as possible, GAGfinder attempts to determine the monosaccharides at each precursor saccharide terminus. There are several cases in which this is possible, and Fig. 4, shows the decision tree for determining this. First, if the non-reducing end is an unsaturated uronic acid (in the cases of CS and HS saccharides generated by polysaccharide lyase enzyme digestion), GAGfinder first assumes that the reducing end monosaccharide is a hexuronic acid if the precursor contains an odd number of monosaccharides, and a hexosamine if the precursor contains an even number of monosaccharides. If this is not the case, then GAGfinder checks whether there is an unequal number of the parts of the repeating disaccharide for the current GAG class. If the number is unequal, then whichever monosaccharide there is more of will be on both the nonreducing and reducing end. If the number is equal, then GAGfinder cannot assign the end fragments and must search through the entire search space. Next, GAGfinder retrieves every possible fragment for the current precursor from GAGfragDB. The possible fragments stored in GAGfragDB include glycosidic bond cleavages and all cross-ring cleavages except for those involving cleavage of adjacent bonds. Supplemental Fig. S2 shows each cross-ring cleavage GAGfinder considers. GAGfragDB stores the theoretical fragments as neutral masses without considering sulfate losses or any other modification information, so GAGfinder must modify and search each fragment in order to maximize spectrum coverage. For each fragment, the modifications included are water loss (for glycosidic fragments only), hydrogen loss (up to 2), sulfate loss (up to the amount designated by the user) and reducing end derivatization (if any). This information is used to determine whether a given fragment corresponds to the reducing terminus. Product ions that have the same chemical composition are merged. For every combination of these modifications, the fragments are pushed through the algorithm. Once all the theoretical fragments have been retrieved and modified as need be, they are scored against the tandem mass spectrum. GAGfinder considers charge states from −1 to that of the precursor ion plus one for each fragment. The decision to use the charge state of the precursor ion plus one for the upper bound rather than that of the precursor ion is because of two main reasons. First, the number of product ions with the same charge state as the precursor is a small percentage of all of the product ions, meaning including this charge state in GAGfinder's searching would find only a few more product ions while introducing more false positives. Second, many of the product ions with the same charge state as the precursor are derivatives of the precursor, meaning they provide no additional structural information. A theoretical relative isotopic distribution (TID) is calculated for each fragment using the BRAIN algorithm (21.Dittwald P. Claesen J. Burzykowski T. Valkenborg D. Gambin A. BRAIN: a universal tool for high-throughput calculations of the isotopic distribution for mass spectrometry.Anal. Chem. 2013; 85: 1991-1994Crossref PubMed Scopus (28) Google Scholar), which employs polynomial expansion and applies the Newton-Girard theorem and Viète's formulae to this end. Once the TID is calculated, GAGfinder searches the tandem mass spectrum for product ion peaks at the m/z values of the TID within either a user-specified error tolerance or the default error tolerance of 20 parts-per-million (ppm), storing them as the experimental isotopic distribution (EID). The EID is then divided by the sum of its intensities so that it is also a relative distribution. GAGfinder employs a G-test of goodness-of-fit to determine how similar the EID is to the TID. Equation 1 shows the expression for the G score, where i is the index of each peak in the matched isotopic distributions. According to the G-test, the G score follows a chi-squared distribution under the null hypothesis that the EID has the same distribution as the TID, and so can be used to compute p values. This way, a lower G score yields a higher p value and thus represents a better fit. G=2∑EIDiIn(EIDiTIDi)(Eq. 1) Once all theoretical fragments have been scored for goodness-of-fit, they are ranked by increasing G score. Depending on whether the user requested the top percentile or top N results, those results are saved into an output file. The output file contains the fragment m/z, charge, intensity, annotation(s), G score, and error in ppm. We chose ten synthetic GAG standards to demonstrate the effectiveness of GAGfinder (Fig. 5). These standards were chosen because of their range of modification distribution and precursor charges. Compounds 1 and 10 were synthesized as described (22.Prabhu A. Venot A. Boons G.J. New set of orthogonal protecting groups for the modular synthesis of heparan sulfate fragments.Organic Letters. 2003; 5: 4975-4978Crossref PubMed Scopus (63) Google Scholar). Compound 2 was a generous gift from Prof. Jian Liu, University of North Carolina, Chapel Hill. Compound 3 was purchased from New England Biolabs (Andover, MA). Compound 5 was purchased as Arixtra pharmaceutical preparation and desalted by size exclusion chromatography. Compounds 4, 6, 7, and 8 were acquired through a publicly available set of HS standard saccharides funded by the NIH and maintained by the Zaia laboratory (http://www.bumc.bu.edu/zaia/gag-synthetic-saccharides-available/). Compound 9 was isolated from porcine intestinal mucosa as described (23.Huang Y. Mao Y. Zong C. Lin C. Boons G. Zaia J. Discovery of a heparan sulfate 3-O-sulfation specific peeling reaction.Anal. Chem. 2015; 81: 592-600Crossref Scopus (32) Google Scholar). These were subjected to electron detachment dissociation (EDD) or negative electron transfer dissociation (NETD) using a Bruker solariX 12T FTMS instrument. For each saccharide, GAGfinder was run retrieving 100% of tested fragments, allowing for two sulfate losses, and using the default error of 20 ppm when mapping fragments to isotopic distributions. For saccharides 1–5, noise was not previously removed, so GAGfinder implemented MasSPIKE to remove noise. For saccharides 6–10, noise was previously removed. Although in principle GAGfinder can handle all classes of GAGs, we show results for HS saccharides for the present work. Details regarding the tandem mass spectrometric acquisition methods can be found in Hu et al. (7.Hu H. Huang Y. Mao Y. Yu X. Xu Y. Liu J. Zong C. Boons G. Lin C. Xia Y. Zaia J. A computational framework for heparan sulfate sequencing using high-resolution tandem mass spectra.Mol. Cell. Proteomics. 2014; 13: 2490-2502Abstract Full Text Full Text PDF PubMed Scopus (25) Google Scholar). Raw data files were converted to mzML format for input into GAGfinder by either MSConvert GUI version 3.0.5084 (13.Liu X. Inbar Y. Dorrestein P.C. Wynne C. Edwards N. Souda P. Whitelegge J.P. Bafna V. Pevzner P.A. Deconvolution and Database Search of Complex Tandem Mass Spectra of Intact Proteins.Mol. Cell. Proteomics. 2010; 9: 2772-2782Abstract Full Text Full Text PDF PubMed Scopus (132) Google Scholar) or compassXport command line utility 3.0.13 (Bruker Daltonics, Inc.). The mass spectrometry glycomics data have been deposited to the ProteomeXchange Consortium via the PRIDE (24.Vizcaino J.A. Csordas A. del-Toro N. Dianes J.A. Griss J. Lavidas I. Mayer G. Perez-Riverol Y. Reisinger F. Ternent T. Xu Q.W. Wang R. Hermjakob H. 2016 update of the PRIDE database and related tools.Nucleic Acids Res. 2016; 44: D447-D456Crossref PubMed Scopus (2775) Google Scholar) partner repository with the data set identifier PXD009101. We first sought to demonstrate the ability of GAGfinder to identify product ion isotope clusters and charge states. To do this, we generated a list of product ions using a traditional averagine-based method (the SNAP peak finder in Bruker DataAnalysis 4.2) versus that for GAGfinder. In order to retrieve every product ion SNAP identified, we set the quality factor threshold at 0, the signal-to-noise ratio (S/N) threshold at 1, the relative intensity threshold (base peak) at 0%, and the absolute intensity threshold at 0. For each GAG saccharide tested, we set the maximum charge state to the absolute value of the precursor charge state minus one, so that SNAP would behave comparatively to GAGfinder. We set the repetitive building block to C6H11.375N1.125O9.5S1.5, as used in previous methods (25.Maxwell E. Tan Y. Tan Y. Hu H. Benson G. Aizikov K. Conley S. Staples G.O. Slysz G.W. Smith R.D. Zaia J. GlycReSoft: A software package for automated recognition of glycans from LC/MS data.PLoS ONE. 2012; 7: e45474Crossref PubMed Scopus (114) Google Scholar). SNAP returned a matrix with columns for m/z, charge, intensity, resolving power, and quality factor. In order to judge GAGfinder's performance in assigning tandem mass spectral monoisotopic product ions and charge states, we employed two separate statistical methods. Each method required unbiased expert manual selection of monoisotopic product ion peaks to serve as the set of true positives. In both methods we had GAGfinder return scores for 100% of the tested theoretical fragments to ensure maximum spectral coverage. The first method compared the GAGfinder performance against that of a random selection of monoisotopic product ions. The second compared GAGfinder's performance to that of an averagine-based peak finding algorithm. The first method for judging GAGfinder's performance was a permutation test that gauged GAGfinder's performance in selecting true positive product ion peaks compared against random selection of product ion peaks. First, we calculated a performance score (PerfScore) for the GAGfinder results using the equation PerfScore=∑jGjHitj(Eq. 2) where j is the index of the current product ion, Gj is the G score for fragment j, and Hitj={1,ifproductionjisa"real"hit0,ifproductionjisnota"real"hit(Eq. 3) Once we calculated the performance score for the GAGfinder results, we permuted the Hit vector 10,000 times and recalculated the performance score for each permutation. Because G scores are smaller for better fits, a smaller performance score represents a better performance. The performance scores of the

Ver no editor

Altmetric

PlumX

Entrar

Lembrar minha senha

Receber meu e-mail de confirmação

Software for Peak Finding and Elemental Composition Assignment for Glycosaminoglycan Tandem Mass Spectra