Methods for Studying Gut Microbiota: A Primer for Physicians
2018; Elsevier BV; Volume: 9; Issue: 1 Linguagem: Inglês
10.1016/j.jceh.2018.04.016
ISSN2213-3453
AutoresArkaprabha Sarangi, Amit Goel, Rakesh Aggarwal,
Tópico(s)Helicobacter pylori-related gastroenterology studies
ResumoHuman gastrointestinal tract contains a large variety of microbes, in particular bacteria. Studies in recent years have strongly suggested a role for these microbes, collectively referred to as gut microbiota, in the maintenance of homeostasis during health. In addition, alterations in gut microbiota have been reported in several diseases, including those related to the gastrointestinal tract and several systemic conditions, and are believed to play a pathogenetic role in at least some of these. Given the close association between the human gut and liver, the association with gut microbiota appears to be particularly strong for a wide variety of liver diseases. This piece, aimed primarily at physicians, reviews in brief the methods used to study gut microbiota, with particular emphasis on those that use sequences of bacterial 16S rRNA gene or its components. Human gastrointestinal tract contains a large variety of microbes, in particular bacteria. Studies in recent years have strongly suggested a role for these microbes, collectively referred to as gut microbiota, in the maintenance of homeostasis during health. In addition, alterations in gut microbiota have been reported in several diseases, including those related to the gastrointestinal tract and several systemic conditions, and are believed to play a pathogenetic role in at least some of these. Given the close association between the human gut and liver, the association with gut microbiota appears to be particularly strong for a wide variety of liver diseases. This piece, aimed primarily at physicians, reviews in brief the methods used to study gut microbiota, with particular emphasis on those that use sequences of bacterial 16S rRNA gene or its components. The term 'human microbiota' refers to the complete set of microbes that live in and on the human body.1Marchesi J.R. Ravel J. The vocabulary of microbiome research: a proposal.Microbiome. 2015; 3: 31Crossref PubMed Google Scholar It appears to play a major role in health and disease, either directly through the expression of microbial genes that provide the human host some metabolic capabilities which its own genome lacks, or indirectly through interaction with human physiology, particularly with the immune system. The main locations on the human body where the microbiota exists are the gastrointestinal tract, female genital tract, oral cavity and the respiratory tract. Of these, the gastrointestinal tract is the site that is the richest in microbial organisms. Several methods have been used to study gut microbiota, and these have undergone a major change over time (Table 1). This article describes the various methods used to study microbiota, and the advantages and limitations of each. The gut microbiota include several different groups of organisms, including bacteria, viruses, fungi, archaea, etc. However, of these, bacteria have been the most extensively studied, and much less is known about the viruses (virome), fungi (fungome) and other prokaryotes (e.g. archea) present in the gastrointestinal tract. This article therefore focuses on the study of gut bacteria, and henceforth the term gut microbiota has been used interchangeably with the set of bacteria present in a person's gut. Also, before one proceeds further, it may be useful to understand a few terms that are used in relation to the study of microbiota (e.g. microbiome, metagenome, etc.), which represent concepts quite similar to, but not identical with, the term 'microbiota' (see Box).Table 1Techniques Used for Study of Microbiota in the Gut as well as Other Body Sites.A. Culture-based methodsB. Molecular-based (nucleic-acid based) methods a. Non-sequencing methods i. Fluorescence in situ hybridization flow cytometry ii. Pulsed field gel electrophoresis iii. Denaturing gradient gel electrophoresis iv. Temperature gradient gel electrophoresis v. Single-strand conformation polymorphism b. Sequence-based methods i. Sequencing of 16S rRNA genes or their hypervariable regions (targeted gene sequencing) ii. Whole bacterial genome DNA (metagenome) sequencing iii. Whole bacterial mRNA (meta-transcriptome) sequencingC. Methods based on detection and quantification of small metabolites i. Gas chromatography mass spectrometry ii. Capillary electrophoresis coupled to mass spectrometry iii. Fourier-transform infrared spectroscopy iv. Nuclear and proton magnetic resonance spectroscopy Open table in a new tab Box 1DefinitionsMicrobiotaThe assemblage of microorganisms present in a defined environment.MetagenomeThe collection of genomes and genes from the members of a microbiota. This collection is obtained through shotgun sequencing of nucleic acid extracted from a specimen (metagenomics) followed by assembly or mapping to a reference database and annotation.MicrobiomeThis term refers to the entire habitat, including the microorganisms (bacteria, archaea, lower and higher eurkaryotes, and viruses), their genomes (i.e., genes), and the surrounding environmental conditions. However, the term is often also used for what is described as 'meta-genome'.MetatranscriptomicsAnalysis of the suite of expressed RNAs (meta-RNAs) by high-throughput sequencing of the corresponding meta-cDNAs. This approach provides information on the regulation and expression profiles of complex microbiomes. The resulting census of all expressed RNAs present in a specimen is called 'metatranscriptome'.MetaproteomicsCharacterization of the entire protein complement of environmental or clinical samples at a given point in time. The resulting census of all proteins present in any given specimen or tissue is called 'proteome'.MetabolomicsDetermination of metabolite profile(s) in any given specimen or tissue. The resulting census of all metabolites present in any given specimen or tissue is called 'metabolome'.These definitions are adapted from Marchesi et al.1Marchesi J.R. Ravel J. The vocabulary of microbiome research: a proposal.Microbiome. 2015; 3: 31Crossref PubMed Google Scholar Microbiota The assemblage of microorganisms present in a defined environment. Metagenome The collection of genomes and genes from the members of a microbiota. This collection is obtained through shotgun sequencing of nucleic acid extracted from a specimen (metagenomics) followed by assembly or mapping to a reference database and annotation. Microbiome This term refers to the entire habitat, including the microorganisms (bacteria, archaea, lower and higher eurkaryotes, and viruses), their genomes (i.e., genes), and the surrounding environmental conditions. However, the term is often also used for what is described as 'meta-genome'. Metatranscriptomics Analysis of the suite of expressed RNAs (meta-RNAs) by high-throughput sequencing of the corresponding meta-cDNAs. This approach provides information on the regulation and expression profiles of complex microbiomes. The resulting census of all expressed RNAs present in a specimen is called 'metatranscriptome'. Metaproteomics Characterization of the entire protein complement of environmental or clinical samples at a given point in time. The resulting census of all proteins present in any given specimen or tissue is called 'proteome'. Metabolomics Determination of metabolite profile(s) in any given specimen or tissue. The resulting census of all metabolites present in any given specimen or tissue is called 'metabolome'. These definitions are adapted from Marchesi et al.1Marchesi J.R. Ravel J. The vocabulary of microbiome research: a proposal.Microbiome. 2015; 3: 31Crossref PubMed Google Scholar The initial studies used traditional bacterial culture techniques, followed by phenotyping of the cultured bacteria using morphological and biochemical characteristics. However, a large proportion of bacteria in the gut are obligate anaerobes, which often do not survive the procedures used for obtaining specimens from the gastrointestinal tract, or for transport to the laboratory and storage. Furthermore, various organisms present in the human gut differ in their propensity to grow in culture. Thus, the results of relative abundance of various bacteria in the gut lumen deduced using culture-based techniques are heavily biased in favour of aerobic organisms that grow easily in in vitro culture, while missing the anaerobic bacteria. Also, these techniques markedly underestimate the diversity of bacteria in the intestinal luminal contents, and hence their usefulness in studying changes in the profile of gut microbiota is limited. Hence, these techniques never gained sufficient traction for the study of profile of gut microbiota, and their use was limited to the study of individual culturable bacterial groups (e.g. a particular genus) in particular clinical situations. To overcome these limitations of culture techniques, and with the development in the late 20th century of techniques for the study of bacterial genomic material, several molecular approaches were developed in which different bacterial species were identified based on the sequences of their 16S ribosomal RNA (16S rRNA) genes. Each living cell contains ribosomes, which are composed of two subunits, one large and one small. The small ribosomal subunit contains an RNA molecule which is 16S in size in case of prokaryotic cells (including bacteria) and 18S in case of eukaryotic cells. These small RNA molecules are encoded by the bacterial genome. The bacterial 16S rRNA is around 1500 nucleotide long,2Bouchet V. Huot H. Goldstein R. Molecular genetic basis of ribotyping.Clin Microbiol Rev. 2008; 21 (table of contents): 262-273Crossref PubMed Scopus (53) Google Scholar with some variation across species. Several stretches of this gene are highly conserved across all bacterial groups. These conserved or constant sequences are interspersed with regions that show marked variation (the 'hypervariable regions'); nine such regions have been recognized and are referred to as V1 to V9 (Figure 1). The variations in nucleotide sequences in these hypervariable regions reflect evolutionary divergence of bacteria, and hence, these sequences provide a reliable method for identification and phylogenetic classification of bacterial species. Methods for bacterial identification based on nucleotide sequences in these regions have the advantage that these do not need prior bacterial culture, and hence can detect bacteria that are culturable as well as those that do not grow well. Further, when these methods are applied to bacterial mixtures, their results provide a relatively unbiased assessment of the relative abundance of various bacterial groups, irrespective of their capability to grow and growth rate in culture. The molecular techniques that were initially developed could exploit only differences in the length (e.g. those identified by gel electrophoresis) and major variations in the nucleotide sequences (e.g. using restriction fragment length polymorphism) of these hypervariable regions across various bacterial species. However, in the last 10–15 years, rapid development in nucleic acid sequencing technology has led to high-throughput multi-parallel sequencing becoming widely available and at a reasonable price; this has made these techniques virtually the current gold standard for the study of gut microbiota. In these techniques, bacterial nucleic acid is extracted from the specimen to be analyzed, followed by amplification either of the entire length of the 16S rRNA gene or a segment of this gene that includes one or more selected hypervariable regions. This can be done using Polymerase Chain Reaction (PCR) with universal primers corresponding to conserved regions in the bacterial genome flanking the entire 16S rRNA gene or its selected hypervariable region(s). The resultant amplified mixture of 16S rRNA genes or of its hypervariable fragments from all the bacteria contained in the specimen can then be resolved using one of several techniques. These techniques have included electrophoresis-based separation based on fragment length [e.g. denaturing gradient gel electrophoresis or on a temperature-gradient gel electrophoresis], or those based on the presence of specific nucleotide sequences [e.g. Fluorescence In Situ Hybridization flow cytometry (FISH-flow)3Inglis G.D. Thomas M.C. Thomas D.K. Kalmokoff M.L. Brooks S.P. Selinger L.B. Molecular methods to measure intestinal bacteria: a review.J AOAC Int. 2012; 95: 5-23Crossref PubMed Google Scholar and bacterial DNA microarrays]. The main drawback of these methods is a limited resolution of bacterial groups. This results from the fact that differences in length as well as sequences of 16S rRNA gene from closely-related bacterial groups (e.g. species, genera, and often larger phylogroups, such as families and orders) are relatively small, precluding their separation. Further, the bacterial groups present in low abundance are missed. Hence, these methods have over time been replaced by newer-generation sequencing techniques. The traditional Sanger technique for nucleic acid sequencing needs relatively pure DNA as the starting material, and provides only one sequence per experiment. Thus, it was not possible to sequence a specimen containing a mixture of related nucleic acids using this technique, except by cloning each of these nucleic acid molecules into separate vectors and sequencing each clone, a very tedious and costly undertaking. Since microbiota contains a mixture of bacteria with somewhat diverse genomic material, these could not be sequenced using this technique. Several newer sequencing technologies, developed over the last 15 years, have permitted massively-parallel sequencing, i.e. simultaneous sequencing of each molecule contained in a DNA mixture, such as that isolated from a microbiota specimen. These techniques however pose two major challenges. First, these generate a large amount of data, with the number of sequences from each specimen often reaching several million, posing a nightmare for analysis. Second, these technologies generally provide much shorter read-lengths than were possible from Sanger sequencing. Several computational software tools and a high computational power that have since become available allow matching of a large number of nucleotide sequences to a large database, as also identification and merger of various overlapping and contiguous short sequence reads into longer reads (the so called contigs). These tools have however helped overcome these limitations, and permitted the widespread use of such sequencing. Several different technologies were developed and commercialized for multi-parallel sequencing. However, most of these have fallen by the wayside, and most of the current studies on microbiota use one of the two equipments from one manufacturer (Illumina), namely: MiSeq (250 or 300-base length reads, lower output) and Illumina HiSeq (150-base length, higher output). In view of their limited read-lengths, these techniques allow sequencing of only one or two adjacent hypervariable regions of the 16S rRNA gene. This information permits one to determine the types of bacteria present as also their relative frequencies (abundance) in a mixed specimen. This sequence length, though practically reasonable for most work, may not effectively classify all bacterial species. A newer alternative, which allows for sequencing of the full-length bacterial 16S rRNA gene, is offered by the more-recently developed Single Molecule, Real-Time (SMRT) circular consensus sequencing equipment from Pacific Biosciences.4Wagner J. Coupland P. Browne H.P. Lawley T.D. Francis S.C. Parkhill J. Evaluation of PacBio sequencing for full-length bacterial 16S rRNA gene classification.BMC Microbiol. 2016; 16: 274Crossref PubMed Scopus (76) Google Scholar However, given the high cost of this technology, it has not yet become popular for the study of microbiota. The study of gut microbiota using the newer-generation multi-parallel sequencing techniques involves several sequential steps, which are described in brief below. The accuracy of gut microbiota analysis depends on appropriate selection, collection and pre-processing of specimens. Specimens used for analysis of human gut microbiota have included stool, intestinal tissue biopsy and intestinal mucosal lavage material – the latter two being collected during endoscopic examination.5Tong M. Jacobs J.P. McHardy I.H. Braun J. Sampling of intestinal microbiota and targeted amplification of bacterial 16S rRNA genes for microbial ecologic analysis.Curr Protoc Immunol. 2014; 107 (41.1–11)Crossref PubMed Scopus (33) Google Scholar Each of these specimens has certain advantages and disadvantages. If the aim is to assess the interaction of a certain segment of the host gut with microbiota, tissue biopsy may be the most preferable, permitting assessment of both the host tissue characteristics and the microbiota. However, several parts of the gastrointestinal tract are not easily amenable to tissue biopsy (e.g. small intestine). Biopsies from other parts of the gut may need specific preparation (e.g. lavage for colonic biopsies), which may itself alter the microbiota. Lavage material too suffers the same limitation. By contrast, fecal specimens draw from several segments along the length of the gastrointestinal tract, though primarily the distal gut. Thus, these provide a good surrogate for bacteria in the colon, the site where gastrointestinal bacteria are anyway the most numerous in density.5Tong M. Jacobs J.P. McHardy I.H. Braun J. Sampling of intestinal microbiota and targeted amplification of bacterial 16S rRNA genes for microbial ecologic analysis.Curr Protoc Immunol. 2014; 107 (41.1–11)Crossref PubMed Scopus (33) Google Scholar Irrespective of the choice of specimen type, all the specimens used in a particular study (whether from one group of subjects at one or multiple time points, or multiple groups that are to be compared with each other, e.g. patients and controls) should be collected, stored and processed in an identical manner. Ideally, all specimens from one study should also be processed simultaneously, and in the same laboratory by the same personnel, to minimize any batch effect.6Sinha R. Chen J. Amir A. et al.Collecting fecal samples for microbiome analyses in epidemiology studies.Cancer Epidemiol Biomarkers Prev. 2016; 25: 407-416Crossref PubMed Scopus (94) Google Scholar In the next step, the specimen is subjected to DNA extraction. Several different protocols have been developed for this step. These methods vary by the type of specimen used for analysis.7Claesson M.J. Jeffery I.B. Conde S. et al.Gut microbiota composition correlates with diet and health in the elderly.Nature. 2012; 488: 178-184Crossref PubMed Scopus (1717) Google Scholar, 8Human Microbiome Project C Structure, function and diversity of the healthy human microbiome.Nature. 2012; 486: 207-214Crossref PubMed Scopus (5652) Google Scholar, 9Human Microbiome Project C A framework for human microbiome research.Nature. 2012; 486: 215-221Crossref PubMed Scopus (1476) Google Scholar Also, the results obtained may vary with the method used. Hence, International Human Microbiome Standards (IHMS) Consortium has provided standard operating procedures to standardize specimen collection and DNA extraction methods for such studies (http://www.microbiome-standards.org), so that data obtained can be compared across studies. Of the nine hypervariable regions in 16S rRNA, V3, V4 and V6, or pairs of adjacent HVRs (e.g. V3–V4 or V4–V5) have been the most widely used. Of these, the V4–V5 region is particularly suited for the study of microbiota, since it provides the most comparable results across platforms10Fouhy F. Clooney A.G. Stanton C. Claesson M.J. Cotter P.D. 16S rRNA gene sequencing of mock microbial populations – impact of DNA extraction method, primer choice and sequencing platform.BMC Microbiol. 2016; 16: 123Crossref PubMed Scopus (145) Google Scholar and provides a high taxonomic resolution.11Clooney A.G. Fouhy F. Sleator R.D. et al.Comparing apples and oranges? Next generation sequencing and its impact on microbiome analysis.PLOS ONE. 2016; 11: e0148028Crossref PubMed Scopus (132) Google Scholar, 12Claesson M.J. Wang Q. O'Sullivan O. et al.Comparison of two next-generation sequencing technologies for resolving highly complex microbiota composition using tandem variable 16S rRNA gene regions.Nucleic Acids Res. 2010; 38: e200Crossref PubMed Scopus (556) Google Scholar However, the sequencing of these hypervariable regions requires a technique with a longer read length than the methods that use only one hypervariable region. The choice of region of 16S rRNA gene to be amplified and sequenced is based on its ability to accurately classify as many genera or species as possible (this needs inputs from previous studies in the literature), level of conservation of the flanking region across microbial species (the higher the better) and its length (whether the sequencing platform chosen can sequence this length in a cost-effective manner). Once the choice of hypervariable region(s) of the 16S rRNA gene to be studied (the region of interest) is made, custom-designed primers which include the priming sequences flanking it as also sequences complementary to Illumina forward and reverse sequencing primers (located on Illumina sequencing flow cell – see below) are used to amplify the region of interest using polymerase chain reaction (Figure 2A ). The existing sequencing methods can generate enormous amount of data in one run (i.e. one experiment). This amount is much larger than the number of sequences (depth of sequencing) that one needs for adequate study of one specimen. Hence, it makes perfect sense to somehow combine multiple specimens in one run, to reduce costs. This is easily done by using slightly different reverse primers, each containing in its sequence a unique six-nucleotide 'index' sequence (Figure 2B). Thus, the amplification products for each specimen will contain different sequences for this 'index'. These products carrying distinct 'index' markers can then be pooled (in roughly equimolar quantities) and run in the same sequencing experiment. This process of pooling of different specimens is referred to as 'multiplexing'. Once the sequence data are obtained, these are computationally segregated (demultiplexed) by reading the 'index' region of each sequence to identify its origin. The DNA library (or a mixture of libraries – if multiplexing is done) is loaded on to a flow cell, which resembles a glass slide to which several oligonucleotide molecules of two different types (the sequencing primers) are attached. The sequencing primers have sequences that are complementary to those of the adapters included at the ends of the two amplification primers used to generate the DNA library. Thus, each DNA molecule in the DNA library attaches to the flow cell via one of the adapters, and carries the other adapter at its free end. Since the number of attachment sites on the flow cell is much larger than the number of DNA molecules added to it, these molecules are widely separated from each other. Through several steps, as described in detail elsewhere,13Mardis E.R. Next-generation DNA sequencing methods.Annu Rev Genomics Hum Genet. 2008; 9: 387-402Crossref PubMed Scopus (1432) Google Scholar each DNA molecule is then used to generate a local cluster consisting of its several identical copies around it. This results in formation of several million distinct clusters, each derived from a separate molecule in the DNA library, on the flow cell. In the next steps, DNA in each of these clusters is sequenced first in one (forward) direction and then in the opposite (reverse) direction. This generates several million pairs of data, with one pair representing data for each cluster, and hence for each individual DNA molecule in the original library. If the DNA fragments in the library are short, then the 3′-ends of the two reads (one forward and one reverse) in each pair can be made to overlap and their data fused with each other during analysis (Figure 3). The two currently-available machines that use the Illumina platform provide reads of 150 (HiSeq) and 300 (MiSeq) bases in either direction. Merger of read pairs can thus generate sequences of up to ∼250 and ∼550 nucleotides, respectively, while providing for an overlap of ∼50 bases in the opposing reads. Individual HVRs from V2 to V7 have average lengths of 86 to 207 nucleotides. Hence either of these platforms can be used to sequence one of these HVR regions. In contrast, the average length of V8 HVR is 322 nucleotides.14Chaudhary N. Sharma A.K. Agarwal P. Gupta A. Sharma V.K. 16S classifier: a tool for fast and accurate taxonomic classification of 16S rRNA hypervariable regions in metagenomic datasets.PLOS ONE. 2015; 10: e0116106Crossref PubMed Scopus (46) Google Scholar Hence, to sequence this HVR region, or two adjacent HVRs, paired-end sequencing using MiSeq platform is advisable. The raw sequence data obtained contain sequences corresponding to sequencing adaptors and primers used for amplification; as the first step, these latter segments are trimmed away. If the paired-end sequencing technique has been used, in which each DNA molecule is sequenced in both directions and the reads in the two directions partially overlap, then the next step is merge the paired forward and reverse reads into one read. This has two main advantages. First, since reads in the two directions overlap only partially, the merger provides a longer read than is possible by reading in only one direction. Second, the merger helps in excluding any low-quality reads. The quality of raw NGS reads declines as sequences proceeds towards the 3′-end. Thus, when bidirectional sequencing is done, the non-overlapping portions (5′ ends) of the forward and reverse reads represent the best quality data, and the overlapping portions (3′ ends) have the relatively poor-quality data. The merger process verifies that the overlapping data in the two directions are identical, serving to ensure that no errors have crept in, thus helping ensure that the overall data quality is good. The sequencing equipment also provides an estimate of the data quality (higher quality = less risk of reading error) for each nucleotide that is read. There is always a possibility that certain bases in some sequences are of low-quality and hence more likely to represent sequencing errors. Quality-control filters are used to identify such poor-quality reads and purge these from the data. Generally, only reads with average quality score of 30 or above (which represents an expected error rate of fewer than one base for every 1000 bases) are selected for further analysis. Widely used open-source tools for primer and adapter trimming, paired-end read merging and quality control analysis are listed in Table 2. Details on the usage, selectable features, strengths and limitations of these tools are usually available on the servers where these are hosted.Table 2Popular Bioinformatics Tools Used for 16S rRNA Metagenome Analysis.PurposeToolsURLTrimming of primers and adaptersCutadapthttps://github.com/marcelm/cutadaptSicklehttps://github.com/najoshi/sicklecutPrimershttps://github.com/aakechin/cutPrimersAdaperRemovalhttps://github.com/MikkelSchubert/adapterremovalQuality controlNGS-QC ToolKithttp://www.nipgr.res.in/ngsqctoolkit.htmlTrimmomatichttp://www.usadellab.org/cms/?page=trimmomaticclinQChttps://sourceforge.net/projects/clinqc/AfterQChttps://github.com/OpenGene/AfterQCMerger of paired-end readsPandaseqhttps://github.com/neufeld/pandaseqPEARhttps://sco.h-its.org/exelixis/web/software/pear/FLASHhttps://ccb.jhu.edu/software/FLASH/MeFiThttps://github.com/nisheth/MeFiT16S-rRNA metagenome analysis pipelinesQIIMEhttp://qiime.org/MOTHURhttps://www.mothur.org/MG-RASThttp://metagenomics.anl.gov/MICCAhttp://micca.org/ Open table in a new tab The next step is clustering or binning the pre-processed high-quality sequences into operational taxonomic units (OTUs). Each OTU represents a cluster of nucleotide sequences that are highly similar and are likely to represent one (or a few closely-related) organisms.15Blaxter M. Mann J. Chapman T. et al.Defining operational taxonomic units using DNA barcode data.Philos Trans R Soc Lond Ser B Biol Sci. 2005; 360: 1935-1943Crossref PubMed Scopus (480) Google Scholar This presumes that sequences with a high degree of nucleotide identity (usually >97%) belong to the same bacterial species. This assumption not only accounts for intra-species sequence variations, but also helps overcome the problem of occasional errors introduced during DNA sequencing; for instance, if two sequences differ by only 1–2 nucleotides, this difference may not be real and be due to sequencing errors, and hence, it makes sense to treat these as one. A lower clustering threshold of 95% is used for genus-level analysis.16Drancourt M. Bollet C. Carlioz A. Martelin R. Gayral J.P. Raoult D. 16S ribosomal DNA sequence analysis of a large collection of environmental and clinical unidentifiable bacterial isolates.J Clin Microbiol. 2000; 38: 3623-3630Crossref PubMed Google Scholar The clustering also reduces the large data set of several sequences (usually in hundreds of thousands) to representative consensus sequences for a few clusters or OTUs and the count of number of sequences in each cluster – this helps reduce the run time of subsequent steps in data analysis. A representative sequence from each OTU is then mapped to a reference 16S-rRNA sequence database. The OTU is then assigned the taxonomy of the closest match found in the database on such mapping. By doing this for all the OTUs, one can obtain information on the various types of bacteria present and relative abundance of each, in a particular
Referência(s)