A Method for Automatically Interpreting Mass Spectra of 18O-Labeled Isotopic Clusters
2006; Elsevier BV; Volume: 6; Issue: 2 Linguagem: Inglês
10.1074/mcp.m600148-mcp200
ISSN1535-9484
AutoresChristopher J. Mason, Terry M. Therneau, Jeanette E. Eckel‐Passow, Kenneth L. Johnson, Ann L. Oberg, Janet E. Olson, K. Sreekumaran Nair, David C. Muddiman, H. Robert Bergen,
Tópico(s)Cancer, Hypoxia, and Metabolism
Resumo16O/18O labeling is one differential proteomics technology among many that promises diagnostic and prognostic biomarkers of disease. Although the incorporation of 18O in the C-terminal carboxyl group during endoproteinase digestion in the presence of H218O makes the process of labeling facile, the ease and effectiveness of label incorporation have in some regards been outweighed by the difficulties in interpreting the resulting spectra. Complex isotope patterns result from the composition of unlabeled (18O0), singly labeled (18O1), and doubly labeled species (18O2) as well as contributions from the naturally occurring isotopes (e.g.13C and 15N). Moreover because labeling is enzymatic, the number of 18O atoms incorporated can vary from peptide to peptide. Finally it is difficult to distinguish highly up-regulated from highly down-regulated or C-terminal peptides. We have developed an algorithm entitled regression analysis applied to mass spectrometry (RAAMS) that automatically, rapidly, and confidently interprets spectra of 18O-labeled peptides without requiring chemical composition information derived from product ion spectra. The algorithm is able to measure the effective 18O incorporation rate due to variable enzyme substrate specificity of the pseudosubstrate during the isotope exchange reaction and corrects for the 18O0 abundance that remains in the labeled sample when using a two-step digestion/labeling procedure. We have also incorporated a method for distinguishing pure 18O0 from pure 18O2 peptides utilizing impure H218O. The algorithm operates on centroided peak lists and is therefore very fast: nine chromatograms of, on average, 1,168 spectra and containing, on average, 6,761 isotopic clusters were interpreted in, on average, 45 s per chromatogram. RAAMS is fast enough (average, 38 ms/spectrum) to allow the possibility of performing information-dependent MS/MS on a chromatographic time scale on species exceeding predetermined ratio thresholds. We describe in detail the operation of the algorithm and demonstrate its use on datasets with known and unknown ratios. 16O/18O labeling is one differential proteomics technology among many that promises diagnostic and prognostic biomarkers of disease. Although the incorporation of 18O in the C-terminal carboxyl group during endoproteinase digestion in the presence of H218O makes the process of labeling facile, the ease and effectiveness of label incorporation have in some regards been outweighed by the difficulties in interpreting the resulting spectra. Complex isotope patterns result from the composition of unlabeled (18O0), singly labeled (18O1), and doubly labeled species (18O2) as well as contributions from the naturally occurring isotopes (e.g.13C and 15N). Moreover because labeling is enzymatic, the number of 18O atoms incorporated can vary from peptide to peptide. Finally it is difficult to distinguish highly up-regulated from highly down-regulated or C-terminal peptides. We have developed an algorithm entitled regression analysis applied to mass spectrometry (RAAMS) that automatically, rapidly, and confidently interprets spectra of 18O-labeled peptides without requiring chemical composition information derived from product ion spectra. The algorithm is able to measure the effective 18O incorporation rate due to variable enzyme substrate specificity of the pseudosubstrate during the isotope exchange reaction and corrects for the 18O0 abundance that remains in the labeled sample when using a two-step digestion/labeling procedure. We have also incorporated a method for distinguishing pure 18O0 from pure 18O2 peptides utilizing impure H218O. The algorithm operates on centroided peak lists and is therefore very fast: nine chromatograms of, on average, 1,168 spectra and containing, on average, 6,761 isotopic clusters were interpreted in, on average, 45 s per chromatogram. RAAMS is fast enough (average, 38 ms/spectrum) to allow the possibility of performing information-dependent MS/MS on a chromatographic time scale on species exceeding predetermined ratio thresholds. We describe in detail the operation of the algorithm and demonstrate its use on datasets with known and unknown ratios. Differential proteomics technologies promise diagnostic and prognostic biomarkers to reduce the burden of disease (1Baker M. In biomarkers we trust?.Nat. Biotechnol. 2005; 23: 297-304Crossref PubMed Scopus (136) Google Scholar, 2Diamandis E.P. Mass spectrometry as a diagnostic and a cancer biomarker discovery tool: opportunities and potential limitations..Mol. Cell. Proteomics. 2004; 3: 367-378Abstract Full Text Full Text PDF PubMed Scopus (565) Google Scholar, 3Ransohoff D.F. Lessons from controversy: ovarian cancer screening and serum proteomics..J. Natl. Cancer Inst. 2005; 97: 315-319Crossref PubMed Scopus (231) Google Scholar, 4Anderson N.L. Anderson N.G. The human plasma proteome: history, character, and diagnostic prospects..Mol. Cell. Proteomics. 2002; 1: 845-867Abstract Full Text Full Text PDF PubMed Scopus (3508) Google Scholar). 16O/18O stable isotope labeling is one tool among many with which to quantify the relative expression differences of proteins between two biological samples (5Yao X. Freas A. Ramirez J. Demirev P.A. Fenselau C. Proteolytic 18O labeling for comparative proteomics: model studies with two serotypes of adenovirus..Anal. Chem. 2001; 73: 2836-2842Crossref PubMed Scopus (774) Google Scholar). 16O/18O labeling, in contrast to other popular differential labeling methodologies such as CAT, isobaric tag for relative and absolute quantitation (iTRAQ™) 1The abbreviations used are: iTRAQ, isobaric tag for relative and absolute quantitation; SILAC, stable isotope labeling of amino acids in cell culture; THRASH, thorough high resolution analysis of spectra by Horn; RAAMS, regression analysis applied to mass spectrometry; SCX, strong cation exchange; TIC, total ion current. 1The abbreviations used are: iTRAQ, isobaric tag for relative and absolute quantitation; SILAC, stable isotope labeling of amino acids in cell culture; THRASH, thorough high resolution analysis of spectra by Horn; RAAMS, regression analysis applied to mass spectrometry; SCX, strong cation exchange; TIC, total ion current., and stable isotope labeling of amino acids in cell culture (SILAC), (a) does not select for peptides containing certain amino acids (a limitation of ICAT), (b) does not require fragmentation spectra (tandem MS) to record an abundance ratio (a limitation of iTRAQ), and (c) is applicable to biological specimens such as serum, plasma, and cerebrospinal fluid (a limitation of SILAC) (6Gygi S.P. Rist B. Gerber S.A. Turecek F. Gelb M.H. Aebersold R. Quantitative analysis of complex protein mixtures using isotope-coded affinity tags..Nat. Biotechnol. 1999; 17: 994-999Crossref PubMed Scopus (4321) Google Scholar, 7Ross P.L. Huang Y.N. Marchese J.N. Williamson B. Parker K. Hattan S. Khainovski N. Pillai S. Dey S. Daniels S. Purkayastha S. Juhasz P. Martin S. Bartlet-Jones M. He F. Jacobson A. Pappin D.J. Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents..Mol. Cell. Proteomics. 2004; 3: 1154-1169Abstract Full Text Full Text PDF PubMed Scopus (3647) Google Scholar, 8Ong S.E. Blagoev B. Kratchmarova I. Kristensen D.B. Steen H. Pandey A. Mann M. Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics..Mol. Cell. Proteomics. 2002; 1: 376-386Abstract Full Text Full Text PDF PubMed Scopus (4514) Google Scholar). In a typical 18O experiment, two samples (or two pools of samples) of a biological matrix such as serum are collected with one typically representing a normal or control population and the other representing a disease state, although other arrangements are possible. One sample is digested with an endoproteinase, typically trypsin (9Hedstrom L. Serine protease mechanism and specificity..Chem. Rev. 2002; 102: 4501-4524Crossref PubMed Scopus (1308) Google Scholar), in the presence of H216O, whereas the other sample is digested with the same enzyme in H218O. The trypsin-catalyzed exchange of up to two 16O atoms for two 18O atoms at the C-terminal carboxyl group of the peptide provides a mass shift of 2 (18O1) or 4 (18O2) daltons. The two digests are then mixed and subjected to mass spectrometry where the isotopic clusters for each peptide appear as a set of peaks corresponding to the unlabeled (18O0) and labeled (18O1 and 18O2) species. Measuring the heights of these peaks gives information about the relative abundances of the various peptides in the two samples.Interpreting the resulting spectra, however, is not completely straightforward. Contributions from the artificially enriched 18O isotopes and naturally occurring isotopes (particularly 13C and 34S) combine to form complex, overlapping isotopic distributions especially for high molecular weight peptides where the monoisotopic peak is no longer the most abundant isotope. Schnolzer et al. (10Schnolzer M. Jedrzejewski P. Lehmann W.D. Protease-catalyzed incorporation of 18O into peptide fragments and its application for protein sequencing by electrospray and matrix-assisted laser desorption/ionization mass spectrometry..Electrophoresis. 1996; 17: 945-953Crossref PubMed Google Scholar) demonstrated that once the initial cleavage and exchange has occurred the proteolytic peptide forms a pseudosubstrate for the enzyme, causing the incorporation of a second 18O atom. However, different peptide pseudosubstrates have different reaction rates, Km, with trypsin, which causes peptide-to-peptide variability in the incorporation of 18O. The reaction rate is determined by such factors as peptide length (10Schnolzer M. Jedrzejewski P. Lehmann W.D. Protease-catalyzed incorporation of 18O into peptide fragments and its application for protein sequencing by electrospray and matrix-assisted laser desorption/ionization mass spectrometry..Electrophoresis. 1996; 17: 945-953Crossref PubMed Google Scholar) and sequence composition (11Yao X. Afonso C. Fenselau C. Dissection of proteolytic 18O labeling: endoprotease-catalyzed 16O-to-18O exchange of truncated peptide substrates..J. Proteome Res. 2003; 2: 147-152Crossref PubMed Scopus (193) Google Scholar).Recent digestion protocols (11Yao X. Afonso C. Fenselau C. Dissection of proteolytic 18O labeling: endoprotease-catalyzed 16O-to-18O exchange of truncated peptide substrates..J. Proteome Res. 2003; 2: 147-152Crossref PubMed Scopus (193) Google Scholar, 12Stratagene Prolytica™ 18O Labeling Kit Instruction Manual. Stratagene, La Jolla, CA2004Google Scholar) introduce a further complication because two digestion steps are utilized: a first digestion using solution-based trypsin in H216O and a second digestion using immobilized trypsin in H218O. For peptides with high incorporation rates, the two-step procedure is effectively the same as the single step digestion; however, with low incorporation rate peptides, the possibility arises that a peptide molecule cleaved in H216O may never reassociate with trypsin in H218O. Bantscheff et al. (13Bantscheff M. Dumpelfeld B. Kuster B. Femtomole sensitivity post-digest 18O labeling for relative quantification of differential protein complex composition..Rapid Commun. Mass Spectrom. 2004; 18: 869-876Crossref PubMed Scopus (55) Google Scholar) observed that, when using the two-step digestion procedure, it is important to account for the peptide molecules with no 18O atoms at their C termini that remain in the labeled sample. In addition to these complications, other factors that can affect the labeling process include back exchange and pH (14Stewart I.I. Thomson T. Figeys D. 18O labeling: a tool for proteomics..Rapid Commun. Mass Spectrom. 2001; 15: 2456-2465Crossref PubMed Scopus (309) Google Scholar).Several different methods are described in the literature for calculating 16O/18O ratios while accounting for variable incorporation. Yao et al. (5Yao X. Freas A. Ramirez J. Demirev P.A. Fenselau C. Proteolytic 18O labeling for comparative proteomics: model studies with two serotypes of adenovirus..Anal. Chem. 2001; 73: 2836-2842Crossref PubMed Scopus (774) Google Scholar) used a ratio combining the experimental abundances of the 18O0, 18O1, and 18O2 peaks along with the theoretical isotopic abundances of those peaks; chemical composition information derived from known peptide sequence or from product ion spectra was used to obtain the theoretical isotopic abundances. Johnson and Muddiman (15Johnson K.L. Muddiman D.C. A method for calculating 16O/18O peptide ion ratios for the relative quantification of proteomes..J. Am. Soc. Mass Spectrom. 2004; 15: 437-445Crossref PubMed Scopus (79) Google Scholar) removed the requirement for sequence identification/product ion spectra by utilizing the average amino acid averagine (16Senko M.W. Beu S.C. McLafferty F.W. Determination of monoisotopic masses and ion populations for large biomolecules from resolved isotopic distributions..J. Am. Soc. Mass Spectrom. 1995; 6: 229-233Crossref PubMed Scopus (391) Google Scholar) to calculate approximate chemical compositions, by modeling the contributions of 13C and 34S with a power function, and by including this in the ratio calculation. Both of these calculations assume a single step digestion and do not account for 18O0 abundance in the labeled sample.At least two software implementations exist that perform ratio calculations; both have limitations. Halligan et al. (17Halligan B.D. Slyper R.Y. Twigger S.N. Hicks W. Olivier M. Greene A.S. ZoomQuant: an application for the quantitation of stable isotope labeled peptides..J. Am. Soc. Mass Spectrom. 2005; 16: 302-306Crossref PubMed Scopus (64) Google Scholar) automated the process of combining the peptide sequence information from Sequest (18Eng J.K. McCormack A.L. Yates III, J.R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database..J. Am. Soc. Mass Spectrom. 1994; 5: 976-989Crossref PubMed Scopus (5367) Google Scholar) with either the Yao et al. (5Yao X. Freas A. Ramirez J. Demirev P.A. Fenselau C. Proteolytic 18O labeling for comparative proteomics: model studies with two serotypes of adenovirus..Anal. Chem. 2001; 73: 2836-2842Crossref PubMed Scopus (774) Google Scholar) or Johnson and Muddiman (15Johnson K.L. Muddiman D.C. A method for calculating 16O/18O peptide ion ratios for the relative quantification of proteomes..J. Am. Soc. Mass Spectrom. 2004; 15: 437-445Crossref PubMed Scopus (79) Google Scholar) ratio calculations. They required zoom scans, utilized the protein identification data to determine the monoisotopic mass, and assumed that this mass appeared experimentally. Qian et al. (19Qian W.J. Monroe M.E. Liu T. Jacobs J.M. Anderson G.A. Shen Y. Moore R.J. Anderson D.J. Zhang R. Calvano S.E. Lowry S.F. Xiao W. Moldawer L.L. Davis R.W. Tompkins R.G. Camp II, D.G. Smith R.D. Quantitative proteome analysis of human plasma following in vivo lipopolysaccharide administration using 16O/18O labeling and the accurate mass and time tag approach..Mol. Cell. Proteomics. 2005; 4: 700-709Abstract Full Text Full Text PDF PubMed Scopus (158) Google Scholar) utilized THRASH (20Horn D.M. Zubarev R.A. McLafferty F.W. Automated reduction and interpretation of high resolution electrospray mass spectra of large molecules..J. Am. Soc. Mass Spectrom. 2000; 11: 320-332Crossref PubMed Scopus (472) Google Scholar) to detect peptide pairs and then applied the ratio equations of Johnson and Muddiman (15Johnson K.L. Muddiman D.C. A method for calculating 16O/18O peptide ion ratios for the relative quantification of proteomes..J. Am. Soc. Mass Spectrom. 2004; 15: 437-445Crossref PubMed Scopus (79) Google Scholar). THRASH, however, uses a single theoretical isotopic distribution and must separately detect the 18O0 and 18O2 species; therefore it will have difficulty with more massive peptides or those with incomplete incorporation.Our current work describes a completely automated method for locating and interpreting 18O-labeled isotopic clusters in parent ion chromatograms without requiring product ion or selected ion monitoring/zoom spectra. Central to our algorithm is the use of linear regression to simultaneously fit all peaks in the isotope cluster rather than just the peaks representing the 18O0, 18O1, and 18O2 species. We use the residuals from the regression to compute a fitting score between the theoretical and experimental isotopic distributions that can be used to rank potential candidate biomarkers. Mirgorodskaya et al. (21Mirgorodskaya O.A. Kozmin Y.P. Titov M.I. Korner R. Sonksen C.P. Roepstorff P. Quantitation of peptides and proteins by matrix-assisted laser desorption/ionization mass spectrometry using 18O-labeled internal standards..Rapid Commun. Mass Spectrom. 2000; 14: 1226-1232Crossref PubMed Scopus (332) Google Scholar) also used linear regression with 18O-labeled internal standards but measured the relative abundances of the 18O1 and 18O2 species for a given peptide in a separate experiment instead of allowing these abundances to vary in the regression.We also describe a method of distinguishing highly up-regulated peptides from highly down-regulated or C-terminal peptides. When using highly enriched H218O, peptides that have fully incorporated two 18O atoms and that appear only in the labeled sample appear identical to peptides that have incorporated no 18O atoms either because they were C-terminal (and thus not a substrate for trypsin) or because they were significantly down-regulated. By lowering the enrichment level of the H218O to, for example, 90%, a unique peak “signature” is generated that the algorithm can recognize to distinguish these two cases. Furthermore our algorithm accounts for the residual 18O0 abundance that remains in the labeled sample from peptides with low incorporation rates; this is particularly relevant when utilizing the two-step digestion procedure.In a manner similar to THRASH, the algorithm performs automated “reduction” of spectra. No assumption is made regarding monoisotopic mass, which is determined using an alignment procedure similar to that described by Senko et al. (16Senko M.W. Beu S.C. McLafferty F.W. Determination of monoisotopic masses and ion populations for large biomolecules from resolved isotopic distributions..J. Am. Soc. Mass Spectrom. 1995; 6: 229-233Crossref PubMed Scopus (391) Google Scholar) and utilized by THRASH. However, THRASH operates directly on the raw spectral data; this allows it to detect species at very low signal to noise ratios but also makes it very slow. Instead we have focused on centroided peak data supplied by the vendor software from parent ion mass spectra for reasons of speed and because such data were readily available. The algorithm is indeed quite fast, interpreting a chromatogram of ∼1,300 spectra in less than 1 min; THRASH might take several hours to process such data. The algorithm was developed using data from FT-ICR instrumentation but should be generally applicable to any instrument with sufficient resolving power.We will describe in detail the operation of our algorithm, entitled regression analysis applied to mass spectrometry (RAAMS), and evaluate its performance using datasets with both known and unknown ratios.COMPUTATIONAL DATA INTERPRETATIONThe algorithm takes as input the centroided peaks from a single spectrum as provided by the instrument vendor software and produces as output a list of isotope clusters detected therein along with their monoisotopic, neutral masses, their centroid retention times, and the ion abundances of both the unlabeled (16O, sample A) and labeled (18O, sample B) species. These two abundances, denoted θA and θB, 2The variables used are: θA, predicted amount of material in sample A (16O, unlabeled sample); θB, predicted amount of material in sample B (18O, labeled sample); X, matrix of theoretical isotopic abundances, see Fig. 2; y, column vector of experimentally observed peak heights, see Fig. 2; ŷ, column vector of predicted peak heights; β, column vector of parameter estimates with β1, β2, and β3 corresponding to the amounts of 18O0, 18O1, and 18O2 (zero, one, or two incorporations), respectively, and β0 corresponding to the intercept; p, fractional purity of H218O, in this work 90% H218O (p = 0.9) is prepared by mixing H216O and the vendor’s >99% H218O; s, effective purity of 18O in the isotopic distribution of an individual peptide, higher s values indicate more incorporation of 18O. respectively, can be used to find differentially expressed peptides in either an absolute (θA − θB) or relative (θB/θA) scale. The estimated variance-covariance matrix of θ is also returned, allowing confidence intervals to be formed.The algorithm operates in three steps. First, a list of all potential isotope clusters is generated based on peak spacing consistent with any allowed charge state. Some isotope peaks will belong to multiple potential isotope clusters. Second, the ion abundances of each of the potential clusters are fit to a theoretical isotopic model using linear regression, giving a best fit to each in terms of the relative amounts of 18O0, 18O1, and 18O2 for that peptide. Finally the algorithm decides between potential isotope clusters that share peaks based on their residuals from the regression fits. Each step is described in further detail below and summarized in Table I.Table IRAAMS algorithm overviewStep 1: Find sets of peaks that could be potential isotope clusters based on their spacing. Find all charge states (1 ≤ z ≤ 6) consistent with isotope spacing of 1/z and having at least 3 peaks.Step 2: For each potential isotope cluster:a) Form a matrix X with theoretical isotopic abundances, generated using averagine (16Senko M.W. Beu S.C. McLafferty F.W. Determination of monoisotopic masses and ion populations for large biomolecules from resolved isotopic distributions..J. Am. Soc. Mass Spectrom. 1995; 6: 229-233Crossref PubMed Scopus (391) Google Scholar) and Mercury (22Rockwood A.L. Van Orden S.L. Ultrahigh-speed calculation of isotope distributions..Anal. Chem. 1996; 68: 2027-2030Crossref PubMed Scopus (90) Google Scholar), for incorporation of 0, 1, or 2 18O atoms.b) Form vectors y with experimental peak abundances, each y vector assuming a different peak is the monoisotopic mass.c) Solve matrix equation using non-negative least squares.d) Decide between possible y vectors (monoisotopic masses) using residuals.e) Compute and correct for incorporation rate using 18O1/18O2 abundances.Step 3: Decide between possible charge states by computing a charge assignment score that combines goodness of fit for all potential isotope clusters that share common peaks. Open table in a new tab Step One: Examine Peak Spacing—In the first step of the algorithm, potential isotope clusters are found with spacing approximately equal to 1/z. To accomplish this, the data are divided into disjoint groups of peaks where a “group” is defined by a leading and trailing peak-free region of at least 3 m/z units. A distance matrix is computed between all pairs of peaks in this group. Charge states up to 6 are considered; elements in the distance matrix with values of 1.00235 ± error, 1.00235/2 ± error, …, 1.00235/6 ± error are of interest. The average charge state spacing of 1.00235 is from Horn et al. (20Horn D.M. Zubarev R.A. McLafferty F.W. Automated reduction and interpretation of high resolution electrospray mass spectra of large molecules..J. Am. Soc. Mass Spectrom. 2000; 11: 320-332Crossref PubMed Scopus (472) Google Scholar). Error in charge state spacing is allowed to be as high as 40 ppm to account for partially or unresolved peaks. For each of the possible charge states from highest to lowest, the algorithm identifies adjacent pairs of peaks that have the requisite spacing. For each such pair, in order of abundance, the cluster is extended by adding those peaks that best fit the spacing of the original pair. The algorithm stops walking outward when one of two conditions is met: (a) the next peak with appropriate spacing is greater than 3 m/z units away or (b) the average error in peak spacing for the whole run exceeds 40 ppm. In this way, a potential isotope cluster is found that has m/z spacings of approximately 1/z for a given charge state z. At least three peaks must be found for a potential isotope cluster to be passed to later stages of the algorithm. Isotope clusters resulting from low abundance species with mass less than ∼500 Da may legitimately have only two peaks (the A (18O0) and A + 1 (13C118O1) peaks with negligible 13C contribution) above the limit of detection, but these are very difficult to distinguish from noise spikes.Three special cases converge to make it difficult to determine, based only on peak spacing, the correct charge state for a given run of peaks. (a) Noise peaks can interdigitate between real peaks. (b) Isotope clusters can legitimately overlap and share one or more peaks. (c) Low abundance peaks can fall below the signal to noise threshold of the peak detection algorithm. Examples of these are shown in Supplemental Fig. S1. To address the issue of noise peaks, legitimate peak sharing, and missing peaks, the algorithm attempts to fit all charge states that are consistent with the peak spacing data. The algorithm also tolerates up to two missing peaks internal to a candidate isotope cluster. In each of the three cases, the result is one or more isotope clusters with correct charge state assignments and one or more incorrect assignments that share peaks with the correct assignments. We resolve between these correct and incorrect isotope clusters in step three of the algorithm (described below).Step Two: Examine Peak Abundances—In step two of the algorithm, the candidate isotope clusters from step one are evaluated based on their abundances. The purpose of step two is to determine the contributions of the three label states (exchange of zero, one, or two oxygen atoms at the C terminus of the peptide) to the experimentally observed abundances of the isotope cluster by least squares fitting of a linear combination of expected isotopic abundances for each of these states. For each candidate isotope cluster, its charge state (determined in step one) and the mass of its first peak are used to compute an approximate neutral mass, and then an approximate chemical composition is determined using the average amino acid averagine, which has the chemical formula C4.9384H7.7583N1.3577O1.4773S0.0417 (16Senko M.W. Beu S.C. McLafferty F.W. Determination of monoisotopic masses and ion populations for large biomolecules from resolved isotopic distributions..J. Am. Soc. Mass Spectrom. 1995; 6: 229-233Crossref PubMed Scopus (391) Google Scholar). Optionally the algorithm can adjust up or down the number of sulfurs predicted by averagine and perform the steps below for these additional expected chemical compositions.As shown in Fig. 2a, for each chemical composition, three theoretical isotopic distributions are generated using Mercury (22Rockwood A.L. Van Orden S.L. Ultrahigh-speed calculation of isotope distributions..Anal. Chem. 1996; 68: 2027-2030Crossref PubMed Scopus (90) Google Scholar) that correspond to the three label states. The first theoretical isotopic distribution is that which would result if all the molecules of peptide had not exchanged either of their C-terminal oxygen atoms (the “18O0 label state”) and is shown as black bars in Fig. 2a. The second theoretical distribution is that which would result if all of the molecules of peptide exchanged exactly one of their C-terminal oxygen atoms for an oxygen atom from the H218O used for labeling (the 18O1 label state). If the purity of the H218O were 100%, the 18O1 distribution would be exactly the same as that of the 18O0 label state except shifted to the right by 2 Da and thus would be indistinguishable. As will be explained later, H218O of 90% purity is used instead, resulting in the 18O1 distribution shown with light gray bars in Fig. 2a. Similarly the third theoretical distribution (18O2 label state) would result if all of the peptides exchanged both of their C-terminal oxygen atoms for two oxygen atoms drawn from the 90% pure H218O. The three theoretical isotopic distributions are arranged in the columns of a matrix, X, as shown in Fig. 2b (note that in this figure, the matrices are shown transposed). The X matrix is scaled so that each column sums to 1. A column of ones is also added to represent the intercept. For a further discussion of the regression model, see Eckel-Passow et al. (23Eckel-Passow J.E. Oberg A.L. Therneau T.M. Mason C.J. Mahoney D.W. Johnson K.L. Olson J.E. Bergen III, H.R. Regression analysis for comparing protein samples with 16O/18O stable-isotope labeled mass spectrometry..Bioinformatics. 2006; 22: 2739-2745Crossref PubMed Scopus (38) Google Scholar).The experimental peak abundances are arranged in a column vector, y, also shown in Fig. 2b. Because it is not known beforehand which experimental peak is the monoisotopic mass (all 12C, all 14N, all 16O, etc.), a number of different y vectors, corresponding to different alignments of theoretical isotopes to experimental peaks, are attempted. Indeed the monoisotopic mass may not even be detected experimentally as in the case of species highly up-regulated in the labeled sample. Fig. 2b shows only the y vector of the correct alignment with the monoisotopic mass marked A; different alignments can be imagined by shifting this vector to the left or right. Where an experimental peak is not detected for a given theoretical isotope, 0 is inserted for the corresponding element of y.The goal then is to determine the linear combination of the columns of X that most closely rese
Referência(s)