metaQuantome: An Integrated, Quantitative Metaproteomics Approach Reveals Connections Between Taxonomy and Protein Function in Complex Microbiomes
2019; Elsevier BV; Volume: 18; Issue: 8 Linguagem: Inglês
10.1074/mcp.ra118.001240
ISSN1535-9484
AutoresCaleb Easterly, Ray Sajulga, Subina Mehta, James E. Johnson, Praveen Kumar, Shane L. Hubler, Bart Mesuere, Joel Rudney, Timothy J. Griffin, Pratik Jagtap,
Tópico(s)Genomics and Phylogenetic Studies
ResumoMicrobiome research offers promising insights into the impact of microorganisms on biological systems. Metaproteomics, the study of microbial proteins at the community level, integrates genomic, transcriptomic, and proteomic data to determine the taxonomic and functional state of a microbiome. However, standard metaproteomics software is subject to several limitations, commonly supporting only spectral counts, emphasizing exploratory analysis rather than hypothesis testing and rarely offering the ability to analyze the interaction of function and taxonomy - that is, which taxa are responsible for different processes.Here we present metaQuantome, a novel, multifaceted software suite that analyzes the state of a microbiome by leveraging complex taxonomic and functional hierarchies to summarize peptide-level quantitative information, emphasizing label-free intensity-based methods. For experiments with multiple experimental conditions, metaQuantome offers differential abundance analysis, principal components analysis, and clustered heat map visualizations, as well as exploratory analysis for a single sample or experimental condition. We benchmark metaQuantome analysis against standard methods, using two previously published datasets: (1) an artificially assembled microbial community dataset (taxonomy benchmarking) and (2) a dataset with a range of recombinant human proteins spiked into an Escherichia coli background (functional benchmarking). Furthermore, we demonstrate the use of metaQuantome on a previously published human oral microbiome dataset.In both the taxonomic and functional benchmarking analyses, metaQuantome quantified taxonomic and functional terms more accurately than standard summarization-based methods. We use the oral microbiome dataset to demonstrate metaQuantome's ability to produce publication-quality figures and elucidate biological processes of the oral microbiome. metaQuantome enables advanced investigation of metaproteomic datasets, which should be broadly applicable to microbiome-related research. In the interest of accessible, flexible, and reproducible analysis, metaQuantome is open source and available on the command line and in Galaxy. Microbiome research offers promising insights into the impact of microorganisms on biological systems. Metaproteomics, the study of microbial proteins at the community level, integrates genomic, transcriptomic, and proteomic data to determine the taxonomic and functional state of a microbiome. However, standard metaproteomics software is subject to several limitations, commonly supporting only spectral counts, emphasizing exploratory analysis rather than hypothesis testing and rarely offering the ability to analyze the interaction of function and taxonomy - that is, which taxa are responsible for different processes. Here we present metaQuantome, a novel, multifaceted software suite that analyzes the state of a microbiome by leveraging complex taxonomic and functional hierarchies to summarize peptide-level quantitative information, emphasizing label-free intensity-based methods. For experiments with multiple experimental conditions, metaQuantome offers differential abundance analysis, principal components analysis, and clustered heat map visualizations, as well as exploratory analysis for a single sample or experimental condition. We benchmark metaQuantome analysis against standard methods, using two previously published datasets: (1) an artificially assembled microbial community dataset (taxonomy benchmarking) and (2) a dataset with a range of recombinant human proteins spiked into an Escherichia coli background (functional benchmarking). Furthermore, we demonstrate the use of metaQuantome on a previously published human oral microbiome dataset. In both the taxonomic and functional benchmarking analyses, metaQuantome quantified taxonomic and functional terms more accurately than standard summarization-based methods. We use the oral microbiome dataset to demonstrate metaQuantome's ability to produce publication-quality figures and elucidate biological processes of the oral microbiome. metaQuantome enables advanced investigation of metaproteomic datasets, which should be broadly applicable to microbiome-related research. In the interest of accessible, flexible, and reproducible analysis, metaQuantome is open source and available on the command line and in Galaxy. Microbiome analysis has enabled the understanding of the effect of microorganisms on diverse biological systems (1.Gilbert J.A. Blaser M.J. Caporaso J.G. Jansson J.K. Lynch S.V. Knight R. Current understanding of the human microbiome.Nat. Med. 2018; 24: 392-400Crossref PubMed Scopus (969) Google Scholar, 2.Moran M.A. The global ocean microbiome.Science. 2015; 350: aac8455Crossref PubMed Scopus (118) Google Scholar, 3.Fierer N. Embracing the unknown: Disentangling the complexities of the soil microbiome.Nat. Rev. Microbiol. 2017; 15: 579-590Crossref PubMed Scopus (1324) Google Scholar, 4.Hörmannsperger G. Schaubeck M. Haller D. Intestinal microbiota in animal models of inflammatory diseases.ILAR J. 2015; 56: 179-191Crossref PubMed Scopus (28) Google Scholar). The microbiome can be studied using a variety of methods, including metagenomics (5.Kuczynski J. Costello E.K. Nemergut D.R. Zaneveld J. Lauber C.L. Knights D. Koren O. Fierer N. Kelley S.T. Ley R.E. Gordon J.I. Knight R. Direct sequencing of the human microbiome readily reveals community differences.Genome Biol. 2010; 11: 210Crossref PubMed Scopus (123) Google Scholar, 6.Quince C. Walker A.W. Simpson J.T. Loman N.J. Segata N. Shotgun metagenomics, from sampling to analysis.Nat. Biotechnol. 2017; 35: 833-844Crossref PubMed Scopus (692) Google Scholar, 7.Human Microbiome Project Consortium Structure, function and diversity of the healthy human microbiome.Nature. 2012; 486: 207-214Crossref PubMed Scopus (6999) Google Scholar), metatranscriptomics (8.Bashiardes S. Zilberman-Schapira G. Elinav E. Use of Metatranscriptomics in microbiome research.Bioinform. Biol. Insights. 2016; 10: 19-25Crossref PubMed Scopus (223) Google Scholar), and metaproteomics (9.Wilmes P. Heintz-Buschart A. Bond P.L. A decade of metaproteomics: Where we stand and what the future holds.Proteomics. 2015; 15: 3409-3417Crossref PubMed Scopus (116) Google Scholar). Metaproteomics studies detect the presence and abundance of microbial peptides and proteins, offering a more direct understanding of the processes being catalyzed by the microbiome than metatranscriptomics and metagenomics (9.Wilmes P. Heintz-Buschart A. Bond P.L. A decade of metaproteomics: Where we stand and what the future holds.Proteomics. 2015; 15: 3409-3417Crossref PubMed Scopus (116) Google Scholar, 10.Verberkmoes N.C. Russell A.L. Shah M. Godzik A. Rosenquist M. Halfvarson J. Lefsrud M.G. Apajalahti J. Tysk C. Hettich R.L. Jansson J.K. Shotgun metaproteomics of the human distal gut microbiota.ISME J. 2009; 3: 179-189Crossref PubMed Scopus (424) Google Scholar, 11.Xiong W. Giannone R.J. Morowitz M.J. Banfield J.F. Hettich R.L. Development of an enhanced metaproteomic approach for deepening the microbiome characterization of the human infant gut.J. Proteome Res. 2015; 14: 133-141Crossref PubMed Scopus (57) Google Scholar, 12.Heyer R. Schallert K. Zoun R. Becher B. Saake G. Benndorf D. Challenges and perspectives of metaproteomic data analysis.J. Biotechnol. 2017; 261: 24-36Crossref PubMed Scopus (121) Google Scholar, 13.Kolmeder C.A. de Vos W.M. Metaproteomics of our microbiome—Developing insight in function and activity in man and model systems.J. Proteomics. 2014; 97: 3-16Crossref PubMed Scopus (82) Google Scholar, 14.Heintz-Buschart A. Wilmes P. Human gut microbiome: Function matters.Trends Microbiol. 2018; 26: 563-574Abstract Full Text Full Text PDF PubMed Scopus (293) Google Scholar, 15.Wilmes P. Bond P.L. The application of two-dimensional polyacrylamide gel electrophoresis and downstream analyses to a mixed community of prokaryotic microorganisms.Environ. Microbiol. 2004; 6: 911-920Crossref PubMed Scopus (283) Google Scholar, 16.Zhang X. Figeys D. Perspective and guidelines for metaproteomics in microbiome studies.J. Proteome Res. 2019; 18: 2370-2380Crossref PubMed Scopus (39) Google Scholar). Furthermore, metaproteomics allows the analysis of both taxonomic abundance and functional state from the same mass spectrometry data. Although metaproteomics is an important component of microbiome research and a complement to other 'omics analyses, limitations in current software restrict the range of methods and accuracy of analyses that can be carried out. First, metaproteomics studies have traditionally quantified peptides with spectral counts, based on counting the number of tandem mass (MS/MS) 1The abbreviations used are: MS/MS, tandem mass spectrometry; MS1, precursor mass spectrum; UPS1, UPS2, Universal Proteomics Standards 1 and 2; GO, Gene Ontology; EC, enzyme commission; NCBI, National Center for Biotechnology Information; LCA, lowest common ancestor; MSE, mean squared error; L2FC, logarithm base 2 of the fold change; WS, with sucrose; NS, no sucrose.1The abbreviations used are: MS/MS, tandem mass spectrometry; MS1, precursor mass spectrum; UPS1, UPS2, Universal Proteomics Standards 1 and 2; GO, Gene Ontology; EC, enzyme commission; NCBI, National Center for Biotechnology Information; LCA, lowest common ancestor; MSE, mean squared error; L2FC, logarithm base 2 of the fold change; WS, with sucrose; NS, no sucrose. spectra assigned to peptides or proteins (17.Lundgren D.H. Hwang S.-I. Wu L. Han D.K. Role of spectral counting in quantitative proteomics.Expert Rev. Proteomics. 2010; 7: 39-53Crossref PubMed Scopus (316) Google Scholar). Accordingly, many available metaproteomics tools only offer amenability to spectral counting-based quantification, including MEGAN (18.Huson D.H. Beier S. Flade I. Górska A. El-Hadidi M. Mitra S. Ruscheweyh H.-J. Tappu R. MEGAN community edition—Interactive exploration and analysis of large-scale microbiome sequencing data.PLOS Comput. Biol. 2016; 12: e1004957Crossref PubMed Scopus (927) Google Scholar), metaGOmics (19.Riffle M. May D.H. Timmins-Schiffman E. Mikan M.P. Jaschob D. Noble W.S. Nunn B.L. MetaGOmics: A web-based tool for peptide-centric functional and taxonomic analysis of metaproteomics data.Proteomes. 2017; 6: E2Crossref PubMed Scopus (32) Google Scholar), and Unipept (20.Gurdeep Singh R. Tanca A. Palomba A. Van der Jeugt F. Verschaffelt P. Uzzau S. Martens L. Dawyndt P. Mesuere B. Unipept 4.0: Functional analysis of metaproteome data.J. Proteome Res. 2019; 18: 606-615Crossref PubMed Scopus (72) Google Scholar). However, research has shown that spectral counts offer a less accurate estimate of peptide abundance than the spectral intensity of the precursor peptide (which is typically done by either integrating the MS1 peak or by recording the apex intensity) (21.Cox J. Hein M.Y. Luber C.A. Paron I. Nagaraj N. Mann M. Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ.Mol. Cell. Proteomics. 2014; 13: 2513-2526Abstract Full Text Full Text PDF PubMed Scopus (2715) Google Scholar). Second, some available bioinformatics tools that intend to support microbiome analysis follow a "gene list" approach and require explicit protein or gene inference, such as DAVID (22.Huang D.W. Sherman B.T. Tan Q. Kir J. Liu D. Bryant D. Guo Y. Stephens R. Baseler M.W. Lane H.C. Lempicki R.A. DAVID bioinformatics resources: Expanded annotation database and novel algorithms to better extract biology from large gene lists.Nucleic Acids Res. 2007; 35: W169-W175Crossref PubMed Scopus (1514) Google Scholar). In metaproteomics, however, it is sometimes difficult to unambiguously assign a parent protein to a detected peptide because proteins between and within species can be highly homologous (23.Muth T. Behne A. Heyer R. Kohrs F. Benndorf D. Hoffmann M. Lehtevä M. Reichl U. Martens L. Rapp E. The MetaProteomeAnalyzer: A powerful open-source software suite for metaproteomics data analysis and interpretation.J. Proteome Res. 2015; 14: 1557-1565Crossref PubMed Scopus (111) Google Scholar). Other tools only support certain types of microbiota in a small number of organisms, such as iMetaLab (24.Liao B. Ning Z. Cheng K. Zhang X. Li L. Mayne J. Figeys D. iMetaLab 1.0: A web platform for metaproteomics data analysis.Bioinformatics. 2018; 34: 3954-3956Crossref PubMed Scopus (23) Google Scholar), which only supports mouse and human gut microbiome analysis. Furthermore, metaproteomics tools rarely offer the ability to directly compare many samples or multiple experimental conditions. Some, such as Unipept, focus on detailed exploratory analysis of a single sample. Others, such as metaGOmics, allow comparison between only two samples. However, as metaproteomics is marked by large datasets and many thousands of functional terms and dozens of taxa, it is essential to compare larger numbers of samples to distinguish true effects from random variation. In addition, available metaproteomics tools rarely offer methods to filter out redundant annotations, leading to less informative conclusions from the data. Finally, while both the taxonomic origin and functional role of peptides (more specifically, of their parent protein) can be determined, few metaproteomics software tools are able to explore the function-taxonomy interaction, that is, the contribution of different taxa to a given functional process and vice versa. In this manuscript, we present a new software suite called metaQuantome, which is composed of several complementary functionalities developed with the intent to fill some of the aforementioned gaps in metaproteomic bioinformatics tools. metaQuantome is free and open source and is available via GitHub, Bioconda (25.Grüning B. Dale R. Sjödin A. Chapman B.A. Rowe J. Tomkins-Tinch C.H. Valieris R. Köster J. Bioconda Team Bioconda: Sustainable and comprehensive software distribution for the life sciences.Nat. Methods. 2018; 15: 475-476Crossref PubMed Scopus (372) Google Scholar), and Galaxy (26.Afgan E. Baker D. Batut B. van den Beek M. Bouvier D. Cech M. Chilton J. Clements D. Coraor N. Grüning B.A. Guerler A. Hillman-Jackson J. Hiltemann S. Jalili V. Rasche H. Soranzo N. Goecks J. Taylor J. Nekrutenko A. Blankenberg D. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update.Nucleic Acids Res. 2018; 46: W537-W544Crossref PubMed Scopus (1806) Google Scholar). To our knowledge, metaQuantome is the only software to enable fully quantitative differential abundance analysis of the functional and taxonomic profile of a metaproteome and one of only a few software tools to enable function-taxonomy interaction analysis. metaQuantome is amenable to data quantified using peptide-level MS1 intensity values, as well as data quantified by more traditional spectral counting methods. It also utilizes functional annotation and taxonomic annotation—generated from any software—to carry out a multifaceted analysis of a metaproteomics dataset, without requiring the use of a specific database or explicit protein inference. Importantly, it provides novel and powerful functionality for analyzing function-taxonomy interactions, enabling users to determine microbe-specific contributions to the functional profile or the profile of microbes contributing to a specific functional protein class—and visualize the results from these investigations. We evaluate the accuracy of metaQuantome in quantifying abundance measures of taxa and biochemical functions indicated from peptide abundance data, compared with standard summarization-based methods. First, we benchmark taxonomic abundance estimation using a mock microbial community dataset (27.Kleiner M. Thorson E. Sharp C.E. Dong X. Liu D. Li C. Strous M. Assessing species biomass contributions in microbial communities via metaproteomics.Nat. Commun. 2017; 8: 1558Crossref PubMed Scopus (113) Google Scholar). We also benchmark functional abundance estimation with a dataset consisting of the Universal Proteomics Standards 1 and 2 (UPS1 and UPS2, Sigma-Aldrich) spiked into an E. coli background (21.Cox J. Hein M.Y. Luber C.A. Paron I. Nagaraj N. Mann M. Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ.Mol. Cell. Proteomics. 2014; 13: 2513-2526Abstract Full Text Full Text PDF PubMed Scopus (2715) Google Scholar). Finally, we demonstrate the analysis and visualization capabilities of the software on a previously published oral microbiome dataset (28.Rudney J.D. Jagtap P.D. Reilly C.S. Chen R. Markowski T.W. Higgins L. Johnson J.E. Griffin T.J. Protein relative abundance patterns associated with sucrose-induced dysbiosis are conserved across taxonomically diverse oral microcosm biofilm models of dental caries.Microbiome. 2015; 3: 69Crossref PubMed Scopus (38) Google Scholar). Our results demonstrate the value of metaQuantome for quantitative analysis of metaproteomics data and advanced exploration of these datasets for microbiome characterization. metaQuantome is a software suite developed in Python using an object-oriented framework and has a command-line interface divided into several modules (Fig. 1A). The modular structure allows for efficient workflows and examination of the data files at each stage of analysis. In the design of the software, we have leveraged the similarities between different functional and taxonomic annotation types to reduce code duplication. metaQuantome is open source under the Apache 2.0 license, and the source code is available for examination at https://github.com/galaxyproteomics/metaquantome. A detailed description of each module follows. Throughout the text, we use "intensity" to refer to the measured spectral intensity from the mass spectrometer and "abundance" to refer to the relative presence of a peptide, taxon, or functional term in the sample. The database (db) module downloads the reference databases: Gene Ontology (GO) terms (29.Gene Ontology Consortium Gene Ontology Consortium: Going forward.Nucleic Acids Res. 2015; 43: D1049-D1056Crossref PubMed Scopus (2003) Google Scholar), Enzyme Commission (EC) numbers (30.Bairoch A. The ENZYME database in 2000.Nucleic Acids Res. 2000; 28: 304-305Crossref PubMed Scopus (752) Google Scholar), and the the National Center for Biotechnology Information (NCBI) taxonomy database (31.Federhen S. The NCBI taxonomy database.Nucleic Acids Res. 2012; 40: D136-D143Crossref PubMed Scopus (776) Google Scholar). We have leveraged existing Python libraries to facilitate the use of these databases: ete3 (32.Huerta-Cepas J. Serra F. Bork P. ETE 3: Reconstruction, analysis, and visualization of phylogenomic data.Mol. Biol. Evol. 2016; 33: 1635-1638Crossref PubMed Scopus (892) Google Scholar) (for taxonomy), GOATOOLS (33.Klopfenstein D.V. Zhang L. Pedersen B.S. Ramírez F. Vesztrocy A.W. Naldi A. Mungall C.J. Yunes J.M. Botvinnik O. Weigel M. Dampier W. Dessimoz C. Flick P. Tang H. GOATOOLS: A Python library for Gene Ontology analyses.Sci. Rep. 2018; 8: 10872Crossref PubMed Scopus (388) Google Scholar) (for GO terms), and Biopython (34.Cock P.J. Antao T. Chang J.T. Chapman B.A. Cox C.J. Dalke A. Friedberg I. Hamelryck T. Kauff F. Wilczynski B. de Hoon M.J. Biopython: Freely available Python tools for computational molecular biology and bioinformatics.Bioinformatics. 2009; 25: 1422-1423Crossref PubMed Scopus (2375) Google Scholar) (for the ENZYME database). After downloading the databases, the next module in the metaQuantome analysis is expand, in which we expand the set of all directly annotated functional or taxonomy terms to include all terms implied by the original annotations (Fig. 1B). We use the term "implied" because many domains of biological knowledge are organized hierarchically, where more specific annotations imply more general annotations above them in the hierarchy, also known as "parents" (one level above in the hierarchy) or "ancestors" (any number of levels above in the hierarchy). For example, the taxonomic annotation "Streptococcus genus" is a parent term to "Streptococcus mutans species." Similarly, hierarchical functional ontologies include GO terms and EC numbers, both of which are supported in metaQuantome. Often, taxonomic and functional annotation tools only provide the most specific term or terms associated with a peptide, for example, Unipept annotates peptides with their lowest common ancestor (LCA), the most specific taxon that is consistent with all potential parent proteins for that peptide (35.Mesuere B. Devreese B. Debyser G. Aerts M. Vandamme P. Dawyndt P. Unipept: Tryptic peptide-based biodiversity analysis of metaproteome samples.J. Proteome Res. 2012; 11: 5773-5780Crossref PubMed Scopus (90) Google Scholar). Therefore, the information returned by annotation tools such as Unipept is often not the full set of information associated with that annotation. In metaQuantome, we expand the set of original annotations to include all the ancestors of the direct annotations. To do this, we have defined several custom Python classes that mirror the structure of the annotation hierarchies. Specifically, each term is defined as an instance of the class AnnotationNode, which contains variables specifying the precursor intensity, the number of unique peptides annotated with that term, and other data (for each experimental sample). The AnnotationNodes are collected into an AnnotationHierarchy, which propagates observed intensities for a term up to each of the term's ancestors. That is, the total abundance of a taxon or functional term is calculated as the sum of the abundances of all peptides annotated with the term and/or any of its descendants (see Fig. 1C), an approach that was also used with spectral counts in metaGOmics (19.Riffle M. May D.H. Timmins-Schiffman E. Mikan M.P. Jaschob D. Noble W.S. Nunn B.L. MetaGOmics: A web-based tool for peptide-centric functional and taxonomic analysis of metaproteomics data.Proteomes. 2017; 6: E2Crossref PubMed Scopus (32) Google Scholar). This allows the user to examine his or her data at different levels of generality—for example, while many peptides may not be specific to a species, examining a taxonomic family allows for estimating the abundance of all species-specific peptides and those specific to the relevant genus and family. The expand process for function-taxonomy interaction analysis is slightly different (Fig. 2). First, taxonomic annotations are "mapped" to the desired rank—that is, a genus is mapped to the associated family. The annotations that have a lower rank than the desired rank are removed. The directly annotated GO terms are used without modification unless the user selects the "map to slim" option. In that case, each GO term is mapped to its closest relative in the GO slim, which is a smaller set of more general GO terms. Finally, the total abundance for a taxon/GO term combination is calculated as the sum of peptide abundances annotated with the taxon/GO term pair. The required input for the expand module is:1.Quantitative information: a tabular file with peptide sequences and the associated intensities. The values can be calculated using any accepted label-free methods, such as MS1 intensity measurements or spectral counting. Prior to use in metaQuantome, the values should be normalized (36.Välikangas T. Suomi T. Elo L.L. A systematic evaluation of normalization methods in quantitative label-free proteomics.Brief. Bioinform. 2018; 19: 1-11PubMed Google Scholar).2.Functional and/or taxonomic information: tabular files with peptide sequences and associated functional terms (either GO terms, EC numbers, or COG categories (37.Galperin M.Y. Kristensen D.M. Makarova K.S. Wolf Y.I. Koonin E.V. Microbial genome analysis: The COG approach.Brief. Bioinform. 2017; (10.1093/bib/bbx117)Google Scholar), for functional analysis) and/or taxonomic LCA assignments (for taxonomic analysis).3.The databases downloaded by metaQuantome db module (described earlier) Aside from the databases, the quantitative information and the functional and/or taxonomic annotations utilized by this module may be derived from any software. Therefore, metaQuantome can always be used with the most up-to-date quantification and annotation tools. The output of the expand module is a tabular file with columns for the term identifiers (IDs), associated descriptive information, aggregated precursor intensities, number of unique peptides annotated, and number of sample children (described below). The filter module should be used before carrying out any visualization or statistics on the output file. Because the analysis of many datasets results in many thousands of functional and taxonomic terms, quality control is essential to ensure that spurious term assignments do not mask true term detections. We employ three strategies to ensure that detected terms are well-supported by the data and are nonredundant (see Fig. 3). First, the user may specify that a term must be supported by a minimum number of distinct peptide sequences (different peptide sequences annotated with the term in question) (Fig. 3A). This allows for filtering out spurious taxonomic or functional terms in which we have lower confidence due to relatively low amounts of supporting data. To enable this filtering, metaQuantome calculates the number of peptides giving evidence to the presence of this term, which is the number of unique peptides directly annotated with this term and/or any of its descendants. Note the difference in the term "children" and "descendants" that has been used here. Descendants for a term A are those terms that are any number of levels below A in the hierarchy and are instances of A, while children of A are descendants that are exactly one level below A. Next, metaQuantome allows for filtering out redundant terms, which we define in this case as terms that carry the exact same quantitative information as a child—that is, if it has exactly one child term in the data. To filter out these redundant terms, metaQuantome calculates the "sample children" (children in the dataset) of each term in the expanded hierarchy then keeps only those with no sample children or at least the number of sample children set by the user (Fig. 3B). The term "sample children" is used to distinguish between a term's children in the database and the term's children in the sample. For example, the GO term "biological adhesion" (GO:0022610) has four children in the GO database as of February 25, 2019 (multicellular organism adhesion, adhesion of symbiont to host, cell adhesion, intermicrovillar adhesion). However, for a given sample, the term "biological adhesion" may only have two children observed in the sample (i.e. detected peptides might be annotated with "multicellular organism adhesion" and "cell adhesion" and not the others). In this case, biological adhesion would have two sample children. When multiple samples are being analyzed, the user is able to select the minimum number of samples per experimental condition for which the criteria must be met for both number of peptides and number of sample children. Finally, metaQuantome can filter terms down to those that are quantified in a minimum number of samples per experimental condition (Fig. 3C). This is especially useful in processing multireplicate datasets for statistical analysis, where, for a given term, a minimum of three replicates per experimental condition is necessary. The output of the filter module is a tabular file with the same columns as that from the expand module, with rows (annotations) that do not fit the specified criteria removed. This file may be used in the stat or viz modules, depending on the researcher's question. The stat module offers methods for the analysis of differential functional abundance and differential taxonomic abundance between two experimental conditions, using validated statistical analysis functions from the statsmodels Python package (38.Seabold, S., and Perktold, J., (2010) Statsmodels: Econometric and Statistical Modeling with Python, in Proceedings of the 9th Python in Science Conference (Scipy), p. 61,Google Scholar). The user may choose a standard parametric t test or a nonparametric rank sum test for unpaired samples and may also choose a parametric paired t test or a nonparametric Wilcoxon signed-rank test for paired samples (39.Ewens W.J. Grant G.R. Statistical methods in bioinformatics: An introduction. 2nd ed. Springer, New York2005Crossref Google Scholar). The resulting p values are corrected for multiple tests using the false discovery rate procedure (40.Benjamini Y. Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing.J. R. Stat. Soc. Ser. B Methodol. 1995; 57: 289-300Google Scholar). The results from the stat module may be displayed in a volcano plot, available within the viz module. The viz module of metaQuantome produces a variety of high-quality, publication-ready visualizations: barplots for the analysis of a single sample or experimental condition and differential abundance analysis, volcano plots, heatmaps, and principal components analysis for comparisons between two or more experimental conditions. The visualizations and some of the statistical operations are carried out by linking to R (41.RCore Team R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria2018Google Scholar) code, due to R's unparalleled visualization capabilities. The visualizations are demonstrated in the Case Study subsection of the Results section. Beyond the built-in visualizations, the filter and stat modules generate a standard tabular file, which permits the user to utilize any preferred statistical or visualization software to ana
Referência(s)