Artigo Acesso aberto Revisado por pares

From coarse to fine: the absolute Escherichia coli proteome under diverse growth conditions

2021; Springer Nature; Volume: 17; Issue: 5 Linguagem: Inglês

10.15252/msb.20209536

ISSN

1744-4292

Autores

Matteo Mori, Zhongge Zhang, Amir Banaei‐Esfahani, Jean‐Benoît Lalanne, Hiroyuki Okano, Ben C. Collins, Alexander Schmidt, Olga T. Schubert, Deok‐Sun Lee, Gene‐Wei Li, Ruedi Aebersold, Terence Hwa, Christina Ludwig,

Tópico(s)

Bacterial Genetics and Biotechnology

Resumo

Article25 May 2021Open Access Transparent process From coarse to fine: the absolute Escherichia coli proteome under diverse growth conditions Matteo Mori orcid.org/0000-0002-6263-8021 Department of Physics, University of California at San Diego, La Jolla, CA, USA Search for more papers by this author Zhongge Zhang Section of Molecular Biology, Division of Biological Sciences, University of California at San Diego, La Jolla, CA, USA Search for more papers by this author Amir Banaei-Esfahani orcid.org/0000-0002-9533-7647 Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland Search for more papers by this author Jean-Benoît Lalanne orcid.org/0000-0001-8753-0669 Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA Department of Physics, Massachusetts Institute of Technology, Cambridge, MA, USA Search for more papers by this author Hiroyuki Okano Department of Physics, University of California at San Diego, La Jolla, CA, USA Search for more papers by this author Ben C Collins orcid.org/0000-0003-0827-3495 Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland School of Biological Sciences, Queen's University of Belfast, Belfast, UK Search for more papers by this author Alexander Schmidt orcid.org/0000-0002-3149-2381 Biozentrum, University of Basel, Basel, Switzerland Search for more papers by this author Olga T Schubert orcid.org/0000-0002-2613-0714 Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, USA Search for more papers by this author Deok-Sun Lee School of Computational Sciences, Korea Institute for Advanced Study, Seoul, Korea Search for more papers by this author Gene-Wei Li orcid.org/0000-0001-7036-8511 Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA Search for more papers by this author Ruedi Aebersold Corresponding Author [email protected] orcid.org/0000-0002-9576-3267 Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland Faculty of Science, University of Zurich, Zurich, SwitzerlandThese authors contributed equally to this work Search for more papers by this author Terence Hwa Corresponding Author [email protected] orcid.org/0000-0003-1837-6842 Department of Physics, University of California at San Diego, La Jolla, CA, USA Section of Molecular Biology, Division of Biological Sciences, University of California at San Diego, La Jolla, CA, USAThese authors contributed equally to this work Search for more papers by this author Christina Ludwig Corresponding Author [email protected] orcid.org/0000-0002-6131-7322 Bavarian Center for Biomolecular Mass Spectrometry (BayBioMS), Technical University of Munich (TUM), Freising, GermanyThese authors contributed equally to this work Search for more papers by this author Matteo Mori orcid.org/0000-0002-6263-8021 Department of Physics, University of California at San Diego, La Jolla, CA, USA Search for more papers by this author Zhongge Zhang Section of Molecular Biology, Division of Biological Sciences, University of California at San Diego, La Jolla, CA, USA Search for more papers by this author Amir Banaei-Esfahani orcid.org/0000-0002-9533-7647 Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland Search for more papers by this author Jean-Benoît Lalanne orcid.org/0000-0001-8753-0669 Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA Department of Physics, Massachusetts Institute of Technology, Cambridge, MA, USA Search for more papers by this author Hiroyuki Okano Department of Physics, University of California at San Diego, La Jolla, CA, USA Search for more papers by this author Ben C Collins orcid.org/0000-0003-0827-3495 Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland School of Biological Sciences, Queen's University of Belfast, Belfast, UK Search for more papers by this author Alexander Schmidt orcid.org/0000-0002-3149-2381 Biozentrum, University of Basel, Basel, Switzerland Search for more papers by this author Olga T Schubert orcid.org/0000-0002-2613-0714 Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, USA Search for more papers by this author Deok-Sun Lee School of Computational Sciences, Korea Institute for Advanced Study, Seoul, Korea Search for more papers by this author Gene-Wei Li orcid.org/0000-0001-7036-8511 Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA Search for more papers by this author Ruedi Aebersold Corresponding Author [email protected] orcid.org/0000-0002-9576-3267 Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland Faculty of Science, University of Zurich, Zurich, SwitzerlandThese authors contributed equally to this work Search for more papers by this author Terence Hwa Corresponding Author [email protected] orcid.org/0000-0003-1837-6842 Department of Physics, University of California at San Diego, La Jolla, CA, USA Section of Molecular Biology, Division of Biological Sciences, University of California at San Diego, La Jolla, CA, USAThese authors contributed equally to this work Search for more papers by this author Christina Ludwig Corresponding Author [email protected] orcid.org/0000-0002-6131-7322 Bavarian Center for Biomolecular Mass Spectrometry (BayBioMS), Technical University of Munich (TUM), Freising, GermanyThese authors contributed equally to this work Search for more papers by this author Author Information Matteo Mori1, Zhongge Zhang2, Amir Banaei-Esfahani3, Jean-Benoît Lalanne4,5, Hiroyuki Okano1, Ben C Collins3,6, Alexander Schmidt7, Olga T Schubert8, Deok-Sun Lee9, Gene-Wei Li4, Ruedi Aebersold *,3,10, Terence Hwa *,1,2 and Christina Ludwig *,11 1Department of Physics, University of California at San Diego, La Jolla, CA, USA 2Section of Molecular Biology, Division of Biological Sciences, University of California at San Diego, La Jolla, CA, USA 3Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland 4Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA 5Department of Physics, Massachusetts Institute of Technology, Cambridge, MA, USA 6School of Biological Sciences, Queen's University of Belfast, Belfast, UK 7Biozentrum, University of Basel, Basel, Switzerland 8Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, USA 9School of Computational Sciences, Korea Institute for Advanced Study, Seoul, Korea 10Faculty of Science, University of Zurich, Zurich, Switzerland 11Bavarian Center for Biomolecular Mass Spectrometry (BayBioMS), Technical University of Munich (TUM), Freising, Germany *Corresponding author. Tel: +41 44 633 3170; E-mail: [email protected] *Corresponding author. Tel: +1 858 534 7263; E-mail: [email protected] *Corresponding author. Tel: +49 8161 71 6199; E-mail: [email protected] Mol Syst Biol (2021)17:e9536https://doi.org/10.15252/msb.20209536 PDFDownload PDF of article text and main figures. Peer ReviewDownload a summary of the editorial decision process including editorial decision letters, reviewer comments and author responses to feedback. ToolsAdd to favoritesDownload CitationsTrack CitationsPermissions ShareFacebookTwitterLinked InMendeleyWechatReddit Figures & Info Abstract Accurate measurements of cellular protein concentrations are invaluable to quantitative studies of gene expression and physiology in living cells. Here, we developed a versatile mass spectrometric workflow based on data-independent acquisition proteomics (DIA/SWATH) together with a novel protein inference algorithm (xTop). We used this workflow to accurately quantify absolute protein abundances in Escherichia coli for > 2,000 proteins over > 60 growth conditions, including nutrient limitations, non-metabolic stresses, and non-planktonic states. The resulting high-quality dataset of protein mass fractions allowed us to characterize proteome responses from a coarse (groups of related proteins) to a fine (individual) protein level. Hereby, a plethora of novel biological findings could be elucidated, including the generic upregulation of low-abundant proteins under various metabolic limitations, the non-specificity of catabolic enzymes upregulated under carbon limitation, the lack of large-scale proteome reallocation under stress compared to nutrient limitations, as well as surprising strain-dependent effects important for biofilm formation. These results present valuable resources for the systems biology community and can be used for future multi-omics studies of gene regulation and metabolic control in E. coli. Synopsis Accurate proteomic measurements of absolute protein mass fractions in Escherichia coli allowed the characterization of proteome responses under > 60 diverse growth conditions from a coarse (groups of related proteins) to a fine (individual) protein level. The study presents a mass spectrometric workflow based on data-independent acquisition proteomics and a novel protein inference algorithm (xTop) optimized for absolute protein quantification. The mass spectrometric data was benchmarked and calibrated with absolute protein mass fractions obtained by ribosome profiling. A plethora of novel biological findings are presented, including lack of large-scale proteome reallocation under stress compared to nutrient limitations, regulation of outer membrane proteins, and effects important for motility and biofilm formation. Introduction Proteins are one of the key molecular players in living cells, directly affecting cell behavior through myriads of activities. They are controlled, regulated, and fine-tuned in time and space through various mechanisms, including protein synthesis, turnover, post-translational modifications, and protein–protein interactions. In comparison to DNA or RNA, proteins represent a more direct readout of cellular functions and phenotypes, since proteins are the biomolecules catalyzing most biochemical reactions. Therefore, quantitative measurements of proteins, their turnover rates, their modification, or their interaction status provide direct snapshots of cellular processes, allowing to associate gene expression to physiology and phenotypes. Over the last decades, liquid-chromatography coupled to tandem mass spectrometry (LC-MS/MS) has matured to be the method of choice for generating quantitative proteomic data (Aebersold & Mann, 2003, 2016). A specific challenge for systems-level studies is the reliable quantification of thousands of proteins, including proteins at low concentrations, across large sample cohorts from a variety of different growth conditions, phenotypes, or strains (Rost et al, 2015). Both relative protein quantification (allowing cross-sample comparisons for the same protein) and absolute protein quantification (allowing cross-protein comparisons in the same sample) provide crucial information on the activity of biochemical and regulation pathways, the stoichiometry of protein complexes, and the relationship between gene expression and cellular phenotype (Ludwig & Aebersold, 2014; Schubert et al, 2015; Schmidt et al, 2016). Furthermore, accurate measurements of absolute protein abundances (e.g., “number of proteins per cell” or “protein mass fractions”), together with the knowledge of cell volume, give the cellular protein concentrations (Appendix Note S1). This can be combined with other omics data to yield detailed biochemical information. For example, translational efficiencies of mRNA can be obtained if data on absolute mRNA concentrations are available (Li et al, 2014), or enzymatic parameters can be investigated if concentrations of metabolites associated with an enzyme and the flux it carries are known (Schubert et al, 2015). The Gram-negative bacterium Escherichia coli is one of the best-characterized model organism, and a workhorse for microbial genetics, biotechnology, and systems biology, thanks to many decades of rigorous molecular and physiological studies (Lee, 1996; Neidhardt, 1996; Bremer & Dennis, 2008; Karp et al, 2018). In the past decade, substantial advancements have been made in the quantitative characterization of the proteome of E. coli, driven in part by elucidating the cost of protein synthesis and the allocation of proteomic resources in different growth conditions (Basan et al, 2015a; Hui et al, 2015; Peebo et al, 2015; Caglar et al, 2017; Erickson et al, 2017). Most of these proteomic studies focused on the absolute abundances of groups of proteins, e.g., the abundances of all enzymes involved in glycolysis or in amino acid synthesis. Quantitative data on protein abundances collected at this coarse-grained level across a spectrum of relevant growth conditions showed that the cost of protein synthesis is key to explain a number of ubiquitous microbial phenomena, e.g., catabolite repression (You et al, 2013; Hui et al, 2015), metabolic overflow (Basan et al, 2015a; Peebo et al, 2015), and diauxic shift (Erickson et al, 2017). While the accuracy of quantitative proteomics at that time was not sufficient for making quantitative statements on the abundances of individual proteins, abundance estimates based on ribosome profiling were able to generate insightful quantitative information at the individual protein level, e.g., in quantitatively assessing the fitness effect of the expression of a single metabolic protein and on the stoichiometric relation between enzymes in protein complexes (Li et al, 2014). However, the elaborate workflow and high costs of ribosome profiling make this demanding method difficult to apply to a large number of growth conditions. A large step in the direction of comprehensive quantitation of E. coli proteomes was made by Schmidt et al (2016), who calibrated mass spectrometric protein intensities using quantified external standards (AQUA peptides) for a subset of 41 proteins expressed at different abundances. This study investigated proteome allocation, expression regulation, and post-translational adaptations of E. coli across a set of 22 different growth conditions. However, despite the improvement in quantitation, their major findings either only considered the total abundance of groups of proteins or were not quantitative in nature. A detailed analysis presented in this work showed that in fact the accuracy of absolute abundance quantitation using AQUA peptide calibration is limited. One key challenge for accurate quantitation of absolute protein abundances in bottom-up proteomics is that peptides, rather than proteins, are the measured analytes. Therefore, absolute protein abundance needs to be inferred from peptide abundances, which is not straightforward—different peptide precursors from the same protein can yield very different intensities. Even when external standards, such as AQUA peptides, are used, they provide only information on proteins from which the peptides are derived from. Additionally, accurate absolute quantification with AQUA peptides is very expensive, work intensive, technically challenging, and can still be error-prone (Ludwig & Aebersold, 2014). In this study, we described a versatile workflow that accurately quantifies absolute abundances of thousands of E. coli proteins at the individual protein level over many conditions. We demonstrated the usefulness of the generated datasets by providing extensive biological analyses of numerous individual proteins, which is something that has not been done previously in proteomic studies of E. coli. Additional utility at the individual protein level will be shown in follow-up studies, where we will combine the data generated here with other omics approaches. Compared to previous studies, our approach provides high-throughput quantification that is comprehensive, accurate, and reproducible, and delivers at low costs and a reasonably fast timescale (1 h per sample). Our pipeline is based on data-independent acquisition mass spectrometry (DIA/SWATH (Gillet et al, 2012; Chapman et al, 2014; Ludwig et al, 2018)) for which we generated a tailor-made comprehensive E. coli spectral library entailing information for 64% of all annotated E. coli proteins. DIA/SWATH mass spectrometry applied to study E. coli proteomes has recently been shown to provide excellent quantitative results in terms of precision, reproducibility, and deep proteome coverage (Midha et al, 2020). Further, we established a novel peptide-to-protein inference algorithm, named xTop, which combines intensities from unique peptides of a given protein across all samples at hand to infer the intensities of that protein. We showed that xTop is superior in estimating relative protein abundances across samples, compared to other commonly used algorithms, such as iBAQ (Schwanhausser et al, 2011) or TopPepN (Silva et al, 2006; Ludwig et al, 2012; Rosenberger et al, 2014). We benchmarked these protein inference methods, along with ribosome profiling, for their estimate of absolute protein abundances against a set of spiked-in reference peptides (AQUA), as well as by using a number of internal references offered by protein complexes with known stoichiometry. We established that absolute protein abundances inferred from ribosome profiling data are superior in accuracy. We therefore calibrated the relative protein abundances provided by proteomics and xTop to the absolute abundance obtained from ribosome profiling, hence obtaining accurate protein abundances across a vast number of samples. Finally, we applied our workflow to explore the E. coli proteome across ~ 60 growth conditions. Here we extended well beyond nutrient limitation (carbon, phosphate, oxygen) and included anaerobic growth, various non-metabolic stresses (high temperature, hyperosmolarity, acetate, ethanol, oxidative), and conditions favoring non-planktonic growth, such as biofilm and colony growth. A total of 2,335 proteins were detected from 66 samples across these conditions. This comprehensive dataset allowed us to characterize proteome responses at the global level and, crucially, for individual proteins at unprecedented detail. At the level of protein sectors (groups of proteins exhibiting similar response patterns under metabolic limitations or antibiotic inhibition), the responses by abundant proteins were found to match what was previously seen for sector aggregates (Peebo et al, 2015; Schmidt et al, 2016; Caglar et al, 2017). However, a large number of newly detected, low-abundant proteins exhibited distinct responses unresolved in previous studies. A more detailed examination of individual proteins in nutrient limitation, stress conditions, and for various commonly used media and genotypes revealed several surprises, including the commonality of the response to growth on different carbon sources, the impact of micronutrients in growth medium, the lack of proteome-wide response to non-metabolic stresses, and factors affecting motility and biofilm formation. These findings shed new light on physiological responses of E. coli to environmental and genetic perturbations, and generate a variety of interesting hypotheses to be further examined by follow-up studies. Results Workflow development We developed a versatile workflow for relative and absolute quantification of E. coli proteomes across many samples using DIA/SWATH mass spectrometry. For the peptide-centric analysis of DIA/SWATH data, a “spectral library” encapsulating prior knowledge about chromatographic and mass spectrometric behavior of peptides is required. We generated a comprehensive E. coli spectral library from a diverse set of E. coli proteomes. Further, we developed a novel protein inference algorithm, termed xTop, and tested its performance in comparison to other commonly used inference algorithms, such as iBAQ (Schwanhausser et al, 2011) and TopPepN (Silva et al, 2006; Ludwig et al, 2012; Rosenberger et al, 2014). Spectral library generation To generate a comprehensive E. coli spectral library for peptide-centric DIA/SWATH data analysis, we followed the workflow illustrated in Fig 1A. To detect as many peptides and proteins as possible, including those proteins that are expressed only under specific growth conditions, we grew E. coli cells in 34 diverse growth conditions, including exponential, stationary, and biofilm-forming conditions, exposure to a spectrum of stresses (high and low pH, hyperosmolarity, high temperature, oxidative stress), as well as a wide range of nutrient sources (Datasets EV1 and EV2). All 34 samples were measured by DDA-based mass spectrometry on a quadrupole-time-of-flight mass spectrometer (TripleTOF 5600, Sciex). To further increase proteome coverage, a pooled sample was fractionated by peptide off-gel electrophoresis (OGE) into 13 fractions, which were measured individually by DDA proteomics. This approach allowed us to increase the peptide coverage from ~ 10,000 for a typical DDA measurement to a total of 26,285 unique peptide sequences, corresponding to 2,770 unique E. coli proteins (64% of all annotated E. coli proteins) (Fig 1B). About ¾ of the identified proteins have been detected with more than three peptides (Fig 1C). The resulting spectral library is freely available through the SWATHAtlas repository in different formats (PASS01421) and can be used by the mass spectrometric community as a comprehensive resource for acquiring and analyzing mass spectreometic data from the model organism E. coli. Figure 1. Spectral library generation to target the Escherichia coli proteome Workflow employed to generate a comprehensive E. coli spectral library. Step 1: A wide range of E. coli cells from various strains grown under different conditions were generated, including different time points of sampling, growth media, high and low pH, aerobic and anaerobic growth, temperatures, high and low osmotic conditions, and different nutrition additives. Peptide fractionation by off-gel electrophoresis (OGE) was performed on a “MixAll” sample. Step 2: All samples were measured in data-dependent acquisition (DDA) mode on a TripleTOF 5600 instrument. In total, 53 MS injections were performed. Step 3: MS2 spectra were matched to the canonical E. coli proteome, and a consensus spectral library was generated. Numbers of proteins, peptides, precursors, and transitions entailed in the E. coli spectral library. Given are the statistics for the unique proteins and peptides only, as well as for all entries, including also shared peptides, iRT peptides as well as 9 control proteins not from the organism E. coli. Distribution of detectable unique peptides per protein. Download figure Download PowerPoint From peptides to proteins: the xTop algorithm Next, we developed a novel quantitative protein inference algorithm, termed “xTop”, which exploits and combines information from all peptides of a given protein detected across all samples to infer the absolute protein intensity in each sample. Salient features of the xTop algorithm are illustrated in Fig 2A. For each protein, the intensities of its peptide precursors p across each sample s are represented as a matrix element Ips. This matrix is modeled as the product of two components, the sample-dependent xTop intensity I s xTop , and the peptide-specific detection efficiency εp. These two components are determined from the data matrix Ips from their maximum a posteriori probability (MAP) estimators (summarized in Figure N2.2 within Appendix Note S2). Importantly, the xTop protein intensity is obtained as a weighted average of all peptide precursors intensities. Peptides whose intensities display a large degree of mutual consistency across samples contribute the most to the intensity I s xTop , while peptides weakly correlated with the others contribute the least. Therefore, this method mitigates the impact of missing or noisy peptide precursors on the inferred protein intensities. An in-depth description of the method and of its implementation is provided in Appendix Note S2. Figure 2. xTop protein quantification and comparison to other methods A. xTop is a protein inference algorithm which models for each protein the intensities of peptide precursors as the product of the xTop protein intensity I s xTop in sample s, and a detection efficiency εp for each peptide precursor p relative to the peptide with the largest intensity (Top1). This allows to integrate consistently the information from the whole dataset and minimizes the impact of missing peptides on the inferred protein intensity. B. We collected 3 biological samples of E. coli K-12 MG1655 (EQ353) in glucose minimal media, matching strain, and condition from Li et al (2014). Two of the three biological replicates were injected 3 times, for a total of 7 proteomics “calibration” datasets. These samples were used for testing the reproducibility of the proteomics measurements and the absolute quantification. C. Peptide precursor intensities measured for the RseA protein across the seven calibration samples. Different symbols and colors indicate different unique peptide precursors. Peptide-level intensities are reported in Datasets EV4 and EV5. D–G. Protein intensities (red open diamonds) obtained from the data in panel (C) (also shown in these panels as smaller symbols) computed with four protein inference algorithms: TopPep1, TopPep3, iBAQ, and xTop (Dataset EV6). H. Variance in the log-ratio of protein intensities between technical (samples F1-1 and F1-2) or biological (samples A1-1 and F1-1) replicates using the same proteins (N = 1,631) quantified in all samples by each method (see also Appendix Fig S2). I. For each protein, the coefficient of variation (CV) of the protein intensities was computed across the seven calibration samples using the same N = 1,939 proteins excluding non-detected proteins. The bar graph shows the median CV for each of the four methods employed. J–L. Scatter of CV computed from TopPep1, TopPep3, and iBAQ against that of xTop. An excess of points is visible above the diagonal (blue line) especially for TopPep3 and iBAQ. Download figure Download PowerPoint Assessment of xTop performance In order to validate the xTop method and benchmark it against various other commonly used protein inference methods (TopPep1/3 and iBAQ), we grew three replicate cultures of E. coli K-12 MG1655 (sub-strain EQ353) cells in minimal medium (MOPS + glucose) in exponential growth (Fig 2B). These “calibration samples” (A1, C1, F1) were measured by DIA/SWATH mass spectrometry using a 64 variable SWATH window acquisition method (Collins et al, 2017). For two out of the three calibration samples (A1 and F1), we additionally performed three technical MS injection replicate measurements. We analyzed the data from these 7 calibration samples in a peptide-centric way using the E. coli spectral library described above and the OpenSWATH software (Rost et al, 2014). We obtained quantitative intensity values for 18,731 peptide precursors (Datasets EV4 and EV5). Peptide intensities were strongly correlated between technical and biological replicates, with Pearson coefficients (r) above 0.987 and 0.966 for technical and biological replicates, respectively (Appendix Fig S1A). The median coefficient of variation (CV) for technical and biological replicates was 5.5 and 10.8%, respectively (Appendix Fig S1B and C). To illustrate the effect of missing peptides on the inferred protein intensities, we considered the protein RseA, an anti-sigma factor. As shown in Fig 2C, four peptide precursors (open symbols of different colors) are detected in the seven replicates of the calibration condition. However, not all of them are detected across all 7 samples. In particular, the peptide precursor with the highest intensity (yellow circles) is not detected in the three replicates of sample A1, while peptide precursors represented by the green up-triangle and blue down-triangles are not uniformly detected across the technical replicates. These variabilities strongly impact the protein intensities inferred by the TopPep1/3 and iBAQ algorithms as shown in Fig 2D–F: First, the TopPep1 algorithm only provides a protein intensity in the four samples in which the top peptide has been detected (Fig 2D, red diamonds). This protein is declared as “not detected” in the other three samples. For both TopPep3 and iBAQ (Fig 2E and F, respectively), the missing peptides lead to a considerable scatter in the inferred protein intensities (red diamonds), even though the scatter in the intensities of each of the detected peptides is much smaller. The xTop algorithm combines the intensities of all the detected peptides across these samples; its inferred protein intensities (Fig 2G, red diamonds) show little scatter across all replicates compared to those generated by TopPep1/3 and iBAQ. Proteome-wide results confirm the expectations from the example above. First, TopPep1 detects on average 1,780 proteins across the calibration samples, about 100 less than the other algorithms (which detect between 1,885 and 1,893 proteins). Both technical and biological replicates are found to be strongly correlated (r < 0.98) (Appendix Fig S2A and D). However, a clear improvement of xTop over the other methods is seen when comparing the variance of the ratio of intensities between pairs of replicates, as summarized in Fig 2H (see also Appendix Fig S2B and E). When using either technical and biological replicates, xTop shows the least scatter, while TopPep3 and iBAQ show the most. The improvement of xTop over the other algorithm

Referência(s)