Artigo Acesso aberto Revisado por pares

Covariation of Peptide Abundances Accurately Reflects Protein Concentration Differences

2017; Elsevier BV; Volume: 16; Issue: 5 Linguagem: Inglês

10.1074/mcp.o117.067728

ISSN

1535-9484

Autores

Bo Zhang, Mohammad Pirmoradian, Roman A. Zubarev, Lukas Käll,

Tópico(s)

Metabolomics and Mass Spectrometry Studies

Resumo

Most implementations of mass spectrometry-based proteomics involve enzymatic digestion of proteins, expanding the analysis to multiple proteolytic peptides for each protein. Currently, there is no consensus of how to summarize peptides' abundances to protein concentrations, and such efforts are complicated by the fact that error control normally is applied to the identification process, and do not directly control errors linking peptide abundance measures to protein concentration. Peptides resulting from suboptimal digestion or being partially modified are not representative of the protein concentration. Without a mechanism to remove such unrepresentative peptides, their abundance adversely impacts the estimation of their protein's concentration. Here, we present a relative quantification approach, Diffacto, that applies factor analysis to extract the covariation of peptides' abundances. The method enables a weighted geometrical average summarization and automatic elimination of incoherent peptides. We demonstrate, based on a set of controlled label-free experiments using standard mixtures of proteins, that the covariation structure extracted by the factor analysis accurately reflects protein concentrations. In the 1% peptide-spectrum match-level FDR data set, as many as 11% of the peptides have abundance differences incoherent with the other peptides attributed to the same protein. If not controlled, such contradicting peptide abundance have a severe impact on protein quantifications. When adding the quantities of each protein's three most abundant peptides, we note as many as 14% of the proteins being estimated as having a negative correlation with their actual concentration differences between samples. Diffacto reduced the amount of such obviously incorrectly quantified proteins to 1.6%. Furthermore, by analyzing clinical data sets from two breast cancer studies, our method revealed the persistent proteomic signatures linked to three subtypes of breast cancer. We conclude that Diffacto can facilitate the interpretation and enhance the utility of most types of proteomics data. Most implementations of mass spectrometry-based proteomics involve enzymatic digestion of proteins, expanding the analysis to multiple proteolytic peptides for each protein. Currently, there is no consensus of how to summarize peptides' abundances to protein concentrations, and such efforts are complicated by the fact that error control normally is applied to the identification process, and do not directly control errors linking peptide abundance measures to protein concentration. Peptides resulting from suboptimal digestion or being partially modified are not representative of the protein concentration. Without a mechanism to remove such unrepresentative peptides, their abundance adversely impacts the estimation of their protein's concentration. Here, we present a relative quantification approach, Diffacto, that applies factor analysis to extract the covariation of peptides' abundances. The method enables a weighted geometrical average summarization and automatic elimination of incoherent peptides. We demonstrate, based on a set of controlled label-free experiments using standard mixtures of proteins, that the covariation structure extracted by the factor analysis accurately reflects protein concentrations. In the 1% peptide-spectrum match-level FDR data set, as many as 11% of the peptides have abundance differences incoherent with the other peptides attributed to the same protein. If not controlled, such contradicting peptide abundance have a severe impact on protein quantifications. When adding the quantities of each protein's three most abundant peptides, we note as many as 14% of the proteins being estimated as having a negative correlation with their actual concentration differences between samples. Diffacto reduced the amount of such obviously incorrectly quantified proteins to 1.6%. Furthermore, by analyzing clinical data sets from two breast cancer studies, our method revealed the persistent proteomic signatures linked to three subtypes of breast cancer. We conclude that Diffacto can facilitate the interpretation and enhance the utility of most types of proteomics data. Mass spectrometry-based proteomics is the preferred technology for quantitative and comprehensive analysis of proteins in complex biological mixtures (1.Ong S.E. Mann M. Mass spectrometry-based proteomics turns quantitative.Nat. Chem. Biol. 2005; 1: 252-262Crossref PubMed Scopus (1317) Google Scholar). Because a typical experiment involves proteolytic digestion, the actual analytes measured by liquid chromatography-tandem mass spectrometry (LC-MS/MS) 1The abbreviations used are: LC-MS/MS, liquid chromatography coupled to tandem mass spectrometry;ANOVA, analysis of variance;CV, coefficient of variation;CPTAC, Clinical Proteomic Tumor Analysis Consortium;DDA, data-dependent acquisition;ERPR, estrogen or progesterone receptor;FARMS, Factor Analysis for Robust Microarray Summarization;FDR, false discovery rate;FQR, false quantification rate;HER2, human epidermal growth factor receptor 2;iTRAQ, isobaric tag for relative and absolute quantitation;LFQ, label-free quantification;LOQ, limit of quantification;MC, Monte Carlo method;FDRMC, false discovery rate based on sequential Monte Carlo simulation;MPIB, Max Planck Institute of Biochemistry;PECA, Probe-level Expression Change Averaging;PQPQ, protein quantification by peptide quality control;PSM, peptide spectrum match;RT, retention time;SILAC, stable isotope labeling by amino acids in cell culture;S/N, signal-to-noise ratio;SpC, spectral counting;TN, triple-negative;XIC, extracted ion chromatography. 1The abbreviations used are: LC-MS/MS, liquid chromatography coupled to tandem mass spectrometry;ANOVA, analysis of variance;CV, coefficient of variation;CPTAC, Clinical Proteomic Tumor Analysis Consortium;DDA, data-dependent acquisition;ERPR, estrogen or progesterone receptor;FARMS, Factor Analysis for Robust Microarray Summarization;FDR, false discovery rate;FQR, false quantification rate;HER2, human epidermal growth factor receptor 2;iTRAQ, isobaric tag for relative and absolute quantitation;LFQ, label-free quantification;LOQ, limit of quantification;MC, Monte Carlo method;FDRMC, false discovery rate based on sequential Monte Carlo simulation;MPIB, Max Planck Institute of Biochemistry;PECA, Probe-level Expression Change Averaging;PQPQ, protein quantification by peptide quality control;PSM, peptide spectrum match;RT, retention time;SILAC, stable isotope labeling by amino acids in cell culture;S/N, signal-to-noise ratio;SpC, spectral counting;TN, triple-negative;XIC, extracted ion chromatography. are the proteolytic peptides of the analyzed proteins. Inferring the identity of proteins that were present in the original mixture before digestion is problematic, especially when proteins are homologs. This cannot be solved by increasing the mass accuracy of measuring peptide molecules and fragments ions (2.Zubarev R.A. Hakansson P. Sundqvist B. Accurate monoisotopic mass measurements of peptides: Possibilities and limitations of high resolution time-of-flight particle desorption mass spectrometry.Rapid Commun. Mass Spectrom. 1996; 10: 1386-1392Crossref Scopus (11) Google Scholar). Currently, there is no consensus concerning how such protein inference should be performed (3.Serang O. Noble W. A review of statistical methods for protein identification using tandem mass spectrometry.Statistics Interface. 2012; 5: 3-20Crossref PubMed Google Scholar, 4.Savitski M.M. Wilhelm M. Hahne H. Kuster B. Bantscheff M. A scalable approach for protein false discovery rate estimation in large proteomic data sets.Mol. Cell. Proteomics. 2015; 14: 2394-2404Abstract Full Text Full Text PDF PubMed Scopus (234) Google Scholar, 5.Ning Z. Zhang X. Mayne J. Figeys D. Peptide-centric approaches provide an alternative perspective to re-examine quantitative proteomic data.Anal. Chem. 2016; 88: 1973-1978Crossref PubMed Scopus (10) Google Scholar). Further complications arise when estimating relative protein concentrations from multiple measurements of peptides. A common assumption is that the peptide abundances are proportional to their source protein's concentration (6.Walther T.C. Mann M. Mass spectrometry-based proteomics in cell biology.J. Cell Biol. 2010; 190: 491-500Crossref PubMed Scopus (307) Google Scholar). Thus, it is common practice to estimate a protein's concentration by the average or aggregate of its constituent peptides' abundances (7.Ishihama Y. Oda Y. Tabata T. Sato T. Nagasu T. Rappsilber J. Mann M. Exponentially modified protein abundance index (emPAI) for estimation of absolute protein amount in proteomics by the number of sequenced peptides per protein.Mol. Cell. Proteomics. 2005; 4: 1265-1272Abstract Full Text Full Text PDF PubMed Scopus (1635) Google Scholar, 8.Silva J.C. Gorenstein M.V. Li G.Z. Vissers J.P. Geromanos S.J. Absolute quantification of proteins by LCMSE: a virtue of parallel MS acquisition.Mol. Cell. Proteomics. 2006; 5: 144-156Abstract Full Text Full Text PDF PubMed Scopus (1140) Google Scholar, 9.Griffin N.M. Yu J. Long F. Oh P. Shore S. Li Y. Koziol J.A. Schnitzer J.E. Label-free, normalized quantification of complex mass spectrometry data for proteomic analysis.Nat. Biotechnol. 2010; 28: 83-89Crossref PubMed Scopus (325) Google Scholar). Theoretically, the peptide mixture obtained from an individual protein is equimolar; however, in reality, the measured peptide abundances span several orders of magnitude. Besides, many factors can violate the assumption of proportionality. For instance, individual peptides might be subject to insufficient enzymatic cleavage or inefficient ionization; fall outside the detection range of the instrument; carry unanticipated sequence variants and modifications; share the sequence with peptides from other proteins; or might fail to be measured in some of the experiments (10.Bantscheff M. Lemeer S. Savitski M.M. Kuster B. Quantitative mass spectrometry in proteomics: critical review update from 2007 to the present.Anal. Bioanal. Chem. 2012; 404: 939-965Crossref PubMed Scopus (581) Google Scholar). Therefore, for many proteins, the quantitative data on constituent peptides are incomplete and sometimes incoherent. To remedy this, some studies propose advanced algorithms employing powerful statistical methods (11.Clough T. Thaminy S. Ragg S. Aebersold R. Vitek O. Statistical protein quantification and significance analysis in label-free LC-MS experiments with complex designs.BMC Bioinformatics. 2012; 13: S6Crossref PubMed Scopus (95) Google Scholar, 12.Cox J. Hein M.Y. Luber C.A. Paron I. Nagaraj N. Mann M. Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ.Mol. Cell. Proteomics. 2014; 13: 2513-2526Abstract Full Text Full Text PDF PubMed Scopus (2687) Google Scholar, 13.Choi M. Chang C.Y. Clough T. Broudy D. Killeen T. MacLean B. Vitek O. MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments.Bioinformatics. 2014; 30: 2524-2526Crossref PubMed Scopus (518) Google Scholar), or conducting a peptide-centric analysis to avoid the inference problem (5.Ning Z. Zhang X. Mayne J. Figeys D. Peptide-centric approaches provide an alternative perspective to re-examine quantitative proteomic data.Anal. Chem. 2016; 88: 1973-1978Crossref PubMed Scopus (10) Google Scholar, 14.Ting Y.S. Egertson J.D. Payne S.H. Kim S. MacLean B. Käll L. Aebersold R. Smith R.D. Noble W.S. MacCoss M.J. Peptide-centric proteome analysis: an alternative strategy for the analysis of tandem mass spectrometry data.Mol. Cell. Proteomics. 2015; 14: 2301-2307Abstract Full Text Full Text PDF PubMed Scopus (98) Google Scholar, 15.Suomi T. Corthals G.L. Nevalainen O.S. Elo L.L. Using peptide-level proteomics data for detecting differentially expressed proteins.J. Proteome Res. 2015; 14: 4564-4570Crossref PubMed Scopus (27) Google Scholar). Nonetheless, most traditional methods do not make use of the covariation of peptide abundances measured under different conditions. By putting more trust in peptides that demonstrate a stronger covariation with the other peptides from the same protein, one can make better use of the proportionality principle. Utilizing such information about covariation, other approaches have been shown to improve the validity of protein inference and signal integration (16.Webb-Robertson B.J. Matzke M.M. Datta S. Payne S.H. Kang J. Bramer L.M. Nicora C.D. Shukla A.K. Metz T.O. Rodland K.D. Smith R.D. Tardiff M.F. McDermott J.E. Pounds J.G. Waters K.M. Bayesian proteoform modeling improves protein quantification of global proteomic measurements.Mol. Cell. Proteomics. 2014; (10.1074/mcp.O113.030932)Abstract Full Text Full Text PDF Scopus (31) Google Scholar, 17.Lukasse P.N.J. America A.H.P. Protein inference using peptide quantification patterns.J. Proteome Res. 2014; 13: 3191-3199Crossref PubMed Scopus (9) Google Scholar, 18.Goeminne L.J. Gevaert K. Clement L. Peptide-level robust ridge regression improves estimation, sensitivity, and specificity in data-dependent quantitative label-free shotgun proteomics.Mol. Cell. Proteomics. 2016; 15: 657-668Abstract Full Text Full Text PDF PubMed Scopus (41) Google Scholar), or provide a basis for selecting peptides for quantitative analysis (19.Forshed J. Johansson H.J. Pernemalm M. Branca R.M. Sandberg A. Lehtio J. Enhanced information output from shotgun proteomics data by protein quantification and peptide quality control (PQPQ).Mol. Cell. Proteomics. 2011; 10Abstract Full Text Full Text PDF PubMed Scopus (20) Google Scholar, 20.Zhu Y. Hultin-Rosenberg L. Forshed J. Branca R.M. Orre L.M. Lehtio J. SpliceVista, a tool for splice variant identification and visualization in shotgun proteomics data.Mol. Cell. Proteomics. 2014; 13: 1552-1562Abstract Full Text Full Text PDF PubMed Scopus (24) Google Scholar). However, these approaches have drawbacks in terms of dependences toward specific quantification techniques or the difficulty with handling missing values; and often incorrectly treat all peptides as independent variables when summarizing each individual LC-MS/MS experiment. Encountered in proteomics, the problem with peptide signal integration has actually an analog in transcriptomics. Particularly, in gene expression microarrays, the biomolecules of interest are full transcripts, whereas the technology measures multiple moieties of the transcripts, i.e. probes (11.Clough T. Thaminy S. Ragg S. Aebersold R. Vitek O. Statistical protein quantification and significance analysis in label-free LC-MS experiments with complex designs.BMC Bioinformatics. 2012; 13: S6Crossref PubMed Scopus (95) Google Scholar, 21.Lockhart D.J. Dong H. Byrne M.C. Follettie M.T. Gallo M.V. Chee M.S. Mittmann M. Wang C. Kobayashi M. Horton H. Brown E.L. Expression monitoring by hybridization to high-density oligonucleotide arrays.Nat. Biotechnol. 1996; 14: 1675-1680Crossref PubMed Scopus (2806) Google Scholar, 22.Pavelka N. Fournier M.L. Swanson S.K. Pelizzola M. Ricciardi-Castagnoli P. Florens L. Washburn M.P. Statistical similarities between transcriptomics and quantitative shotgun proteomics data.Mol. Cell. Proteomics. 2008; 7: 631-644Abstract Full Text Full Text PDF PubMed Scopus (135) Google Scholar). Recent technological advances in LC-MS/MS have brought proteomics to a state where its proteome coverage is comparable to that of microarrays (6.Walther T.C. Mann M. Mass spectrometry-based proteomics in cell biology.J. Cell Biol. 2010; 190: 491-500Crossref PubMed Scopus (307) Google Scholar, 23.Cox J. Mann M. Is proteomics the new genomics?.Cell. 2007; 130: 395-398Abstract Full Text Full Text PDF PubMed Scopus (342) Google Scholar, 24.Pirmoradian M. Budamgunta H. Chingin K. Zhang B. Astorga-Wells J. Zubarev R.A. Rapid and deep human proteome analysis by single-dimension shotgun proteomics.Mol. Cell. Proteomics. 2013; 12: 3330-3338Abstract Full Text Full Text PDF PubMed Scopus (106) Google Scholar). Although the selected sets of probes in a microarray experiment may exhibit varying affinity and genome-wide specificity (25.Wu Z.J. Irizarry R.A. Gentleman R. Martinez-Murillo F. Spencer F. A model-based background adjustment for oligonucleotide expression arrays.J. Am. Statistical Assoc. 2004; 99: 909-917Crossref Scopus (1245) Google Scholar), the veracity of the target transcripts is seldom questioned. One might then ask why proteomics, which also has multiple measurements for every target protein, requires every reporter peptide to be attributed to the source protein uniquely and be correctly identified by MS/MS, preferably in every sample. Such stringent requirements might provide a false sense of security, as it is easy to believe that correct identifications are well-suited for quantification. However, the actual relation between peptide identification and quantification may very well be reversed: as was found in our previous study (26.Zhang B. Käll L. Zubarev R.A. DeMix-Q: quantification-centered data processing workflow.Mol. Cell. Proteomics. 2016; 15: 1467-1478Abstract Full Text Full Text PDF PubMed Scopus (47) Google Scholar), well-characterized chromatographic features have a better chance to be associated with correct peptide identities. In any case, the rate of false association between peptide identity and peptide quantity has not been fully investigated, and this issue is often ignored altogether. With the increasing sample sizes in proteomics studies, the impact of false quantifications may aggregate into a nonnegligible magnitude, which may affect the outcome of studies. Fortunately, the problem with quantitatively aggregating multiple reporters into a single readout has been thoroughly investigated in microarray analysis for decades, and a set of well-characterized procedures have been developed (25.Wu Z.J. Irizarry R.A. Gentleman R. Martinez-Murillo F. Spencer F. A model-based background adjustment for oligonucleotide expression arrays.J. Am. Statistical Assoc. 2004; 99: 909-917Crossref Scopus (1245) Google Scholar, 27.Smyth G.K. Michaud J. Scott H.S. Use of within-array replicate spots for assessing differential expression in microarray experiments.Bioinformatics. 2005; 21: 2067-2075Crossref PubMed Scopus (1052) Google Scholar, 28.Hochreiter S. Clevert D.A. Obermayer K. A new summarization method for Affymetrix probe level data.Bioinformatics. 2006; 22: 943-949Crossref PubMed Scopus (198) Google Scholar). We argue that those hard-earned insights from microarray analysis can also be applied in proteomics to improve its quantification accuracy. In particular, we propose a differential analysis approach that we dubbed Diffacto. A popular Bayesian factor analysis algorithm (28.Hochreiter S. Clevert D.A. Obermayer K. A new summarization method for Affymetrix probe level data.Bioinformatics. 2006; 22: 943-949Crossref PubMed Scopus (198) Google Scholar, 29.Talloen W. Clevert D.A. Hochreiter S. Amaratunga D. Bijnens L. Kass S. Gohlmann H.W. I/NI-calls for the exclusion of non-informative genes: a highly effective filtering tool for microarray data.Bioinformatics. 2007; 23: 2897-2902Crossref PubMed Scopus (92) Google Scholar) has been implemented in this approach to handle incoherent reporter behaviors. The factor analysis extracts differential signals by utilizing the covariation over multiple experiments of abundances of a group of correlated peptides tentatively linked to a dominant proteoform. Contrary to the popular principal component analysis, factor analysis strives to explain the covariance between observables rather than the variance within the observables, because the latter is mainly caused by random noise. In this regard, factor analysis explicitly assumes the presence of noise, and thus is more elaborate than principle component analysis. The signal (factor) represents the protein concentration change that is extracted from the correlations of measurements across multiple conditions. The signal-to-noise ratio (S/N) is then estimated for every group of peptides attributed to a single protein, to determine whether this group is informative, or too contradictory to reliably quantify. The informative groups of peptides may still contain incoherent peptides whose signals contradict those of other peptides. Such peptides are eliminated from the group before estimating the relative difference in protein concentration as a weighted geometric mean of differences in abundance of the peptides. By eliminating uninformative groups and incoherent peptide data, Diffacto reduces noise while preserving the quantitative signal largely intact, thereby allowing one to extract more useful biological information from the same proteomics data set. We demonstrate that Diffacto is a robust, sensitive and flexible method for differential proteome analysis, well suited for quantification-centered proteomics (26.Zhang B. Käll L. Zubarev R.A. DeMix-Q: quantification-centered data processing workflow.Mol. Cell. Proteomics. 2016; 15: 1467-1478Abstract Full Text Full Text PDF PubMed Scopus (47) Google Scholar). An Orbitrap Q-Exactive Plus mass spectrometer was connected to an ultrahigh performance LC system (50-cm EASY-Spray column driven by an EASY-nLC 1000 pump), all instruments produced by Thermo Fisher Scientific (Bremen, Germany). Each sample was injected three times and analyzed in single-shot experiments with 80 min LC gradient, where the primary full-range (m/z 375 to 1400 Th) MS spectra were acquired with high resolution (140,000). Following every primary MS spectrum, one secondary MS spectrum (resolution 17,500) was acquired in a constricted m/z range (375–481, 479–601, or 599–1400 Th) for triggering data-dependent acquisition (top-10 DDA, dynamic exclusion 15 s) of tandem mass spectra (resolution 17,500). This segmented DDA approach (30.Vincent C.E. Potts G.K. Ulbrich A. Westphall M.S. Atwood 3rd, J.A. Coon J.J. Weatherly D.B. Segmentation of precursor mass range using "tiling" approach increases peptide identifications for MS1-based label-free quantification.Anal. Chem. 2013; 85: 2825-2832Crossref PubMed Scopus (12) Google Scholar) minimized the redundancy of MS/MS spectra between the three LC-MS/MS runs. To increase peptide identification efficiency by multiplexing MS/MS spectra of cofragmenting peptides (31.Zhang B. Pirmoradian M. Chernobrovkin A. Zubarev R.A. DeMix workflow for efficient identification of cofragmented peptides in high resolution data-dependent tandem mass spectrometry.Mol. Cell. Proteomics. 2014; 13: 3211-3223Abstract Full Text Full Text PDF PubMed Scopus (44) Google Scholar), precursor isolation windows in the three runs were set to 2.0, 4.0 and 6.0 Th, respectively; normalized collision energy (NCE) for higher-energy collision dissociation (HCD) was set to 29 eV, 30 eV, and 31 eV, respectively. The choices of window widths and energy were based on empirical knowledge about optimal instrument settings (24.Pirmoradian M. Budamgunta H. Chingin K. Zhang B. Astorga-Wells J. Zubarev R.A. Rapid and deep human proteome analysis by single-dimension shotgun proteomics.Mol. Cell. Proteomics. 2013; 12: 3330-3338Abstract Full Text Full Text PDF PubMed Scopus (106) Google Scholar), and the consideration about the density of precursors in the corresponding m/z ranges. Standard digests (purchased from Promega, Madison, WI) of human cell lysates, yeast cell lysates and bovine serum albumin (BSA) were mixed at twenty different ratios (supplemental Table S1). The proportion of human peptides was reduced linearly, whereas the fraction of BSA peptides increased exponentially, and the share of yeast peptides increased nonlinearly so that all samples had equal total amounts of peptides. In each sample, 5.0 μg of peptide mixture was dissolved to a 30 μl solution, of which 6 μl were injected three times in a LC-MS/MS experiment (i.e. 1.0 μg peptides per injection). Raw and converted data were deposed to MassIVE (MSV000079811) and ProteomeXchange (PXD004308). We identified peptides using the DeMix workflow (31.Zhang B. Pirmoradian M. Chernobrovkin A. Zubarev R.A. DeMix workflow for efficient identification of cofragmented peptides in high resolution data-dependent tandem mass spectrometry.Mol. Cell. Proteomics. 2014; 13: 3211-3223Abstract Full Text Full Text PDF PubMed Scopus (44) Google Scholar), in which the MS/MS spectra were de-multiplexed by matching the isolation windows with the chromatographic feature maps generated using the full-range (survey) MS spectra by OpenMS FeatureFinderCentroided (ver. 2.0) (32.Kohlbacher O. Reinert K. Gropl C. Lange E. Pfeifer N. Schulz-Trieglaff O. Sturm M. TOPP–the OpenMS proteomics pipeline.Bioinformatics. 2007; 23: e191-e197Crossref PubMed Scopus (214) Google Scholar). MS/MS spectra with the original and extended precursor information were searched independently in a concatenated UniProt (33.UniProt C. UniProt: a hub for protein information.Nucleic Acids Res. 2015; 43: D204-D212Crossref PubMed Scopus (3497) Google Scholar) reference proteome database (6720 yeast protein sequences of release 2015_12, 91618 human protein sequences of release 2015_07, and the sequence of BSA UniProt_ID P02769) using Morpheus search engine (ver. 165) (34.Wenger C.D. Coon J.J. A proteomics search algorithm specifically designed for high-resolution tandem mass spectra.J. Proteome Res. 2013; 12: 1377-1386Crossref PubMed Scopus (113) Google Scholar). Carbamidomethylation of cysteine was set as a fixed modification, and oxidation of methionine was considered as a variable modification. The target-decoy approach was applied and one missed tryptic cleavage was allowed (no proline rule). Precursor and product mass tolerances were set to 6 ppm and 18 ppm, respectively. The resulting peptide-spectral matches (PSMs) were filtered by q-value (<1%) for each individual run. Peptide-level identification and quantification were integrated through the DeMix-Q workflow (26.Zhang B. Käll L. Zubarev R.A. DeMix-Q: quantification-centered data processing workflow.Mol. Cell. Proteomics. 2016; 15: 1467-1478Abstract Full Text Full Text PDF PubMed Scopus (47) Google Scholar), in which peptide chromatographic features were peak-picked from the full-range (primary) MS spectra and tentatively associated with available PSMs using OpenMS IDMapper (ver. 2.0) (32.Kohlbacher O. Reinert K. Gropl C. Lange E. Pfeifer N. Schulz-Trieglaff O. Sturm M. TOPP–the OpenMS proteomics pipeline.Bioinformatics. 2007; 23: e191-e197Crossref PubMed Scopus (214) Google Scholar). Thereafter, the MapAlignerPoseClustering (ver. 2.0) was applied (with maximum 180 RT difference and 5 ppm precursor mass difference) to align all feature maps to the reference run (the run with the largest number of peptide-like chromatographic features), and calibrate RT to a similar scale. Subsequently, FeatureLinkerUnlabeledQT (ver. 2.0) was used to link chromatographic features across different LC-MS/MS runs and generate a consensus feature map. The consensus map provided the base for the subsequent identity propagation, where peptide identities were transferred from runs with PSM information to runs without the MS/MS information. To further increase the coverage of quantitative information, a more sensitive (extracted ion chromatography, XIC-based) signal extraction was applied by EICExtractor (ver. 2.0). Quantities from XIC were propagated to the runs where the features were not initially covered by the consensus map but precursor mass peaks at a given retention time and m/z window around the consensus feature (60 s and 5 ppm) were detected. An estimated 5% feature-level FDR was applied as a quality threshold for this process (26.Zhang B. Käll L. Zubarev R.A. DeMix-Q: quantification-centered data processing workflow.Mol. Cell. Proteomics. 2016; 15: 1467-1478Abstract Full Text Full Text PDF PubMed Scopus (47) Google Scholar). If a consensus feature was linked to PSMs with different sequences, only the most common sequence was kept. Peptide abundances were reported as a sum of feature abundances from all charge-state and modifications forms of the respective sequences and normalized by the average of valid measurements of peptide abundances for each individual run. Peptide identification and quantification results were obtained from the supplemental Materials of two clinical studies without re-processing mass spectrometry data. (1) The CPTAC breast cancer data set was acquired from the CPTAC study (Mertins et al. 2016) (35.Mertins P. Mani D.R. Ruggles K.V. Gillette M.A. Clauser K.R. Wang P. Wang X. Qiao J.W. Cao S. Petralia F. Kawaler E. Mundt F. Krug K. Tu Z. Lei J.T. Gatza M.L. Wilkerson M. Perou C.M. Yellapantula V. Huang K.L. Lin C. McLellan M.D. Yan P. Davies S.R. Townsend R.R. Skates S.J. Wang J. Zhang B. Kinsinger C.R. Mesri M. Rodriguez H. Ding L. Paulovich A.G. Fenyo D. Ellis M.J. Carr S.A. Nci C. Proteogenomics connects somatic mutations to signalling in breast cancer.Nature. 2016; 534: 55-62Crossref PubMed Scopus (977) Google Scholar). This set was normalized in a similar approach as the original study. Peptide iTRAQ log-ratios (in relation to the internal reference) of 80 (77 samples and 3 replicate measurements) breast cancer samples (quality control passed), were normalized by kernel density estimation of two-component Gaussian mixture models, and zero-centered by subtracting the mean log-ratio of the major Gaussian distribution. Peptides quantified in no more than 30 samples were discarded. (2) The MPIB breast cancer data set was acquired from the original study conducted at the Max Planck Institute of Biochemistry, Germany (Tyanova et al. 2016) (36.Tyanova S. Albrechtsen R. Kronqvist P. Cox J. Mann M. Geiger T. Proteomic maps of breast cancer subtypes.Nat. Commun. 2016; 7: 10259Crossref PubMed Scopus (182) Google Scholar).

Referência(s)