Molecular characterization of the human microbiome from a reproductive perspective
2015; Elsevier BV; Volume: 104; Issue: 6 Linguagem: Inglês
10.1016/j.fertnstert.2015.10.008
ISSN1556-5653
AutoresAmir Mor, Paul H. Driggers, James H. Segars,
Tópico(s)Pelvic floor disorders treatments
ResumoThe process of reproduction inherently poses unique microbial challenges because it requires the transfer of gametes from one individual to the other, meanwhile preserving the integrity of the gametes and individuals from harmful microbes during the process. Advances in molecular biology techniques have expanded our understanding of the natural organisms living on and in our bodies, including those inhabiting the reproductive tract. Over the past two decades accumulating evidence has shown that the human microbiome is tightly related to health and disease states involving the different body systems, including the reproductive system. Here we introduce the science involved in the study of the human microbiome. We examine common methods currently used to characterize the human microbiome as an inseparable part of the reproductive system. Finally, we consider a few limitations, clinical implications, and the critical need for additional research in the field of human fertility. The process of reproduction inherently poses unique microbial challenges because it requires the transfer of gametes from one individual to the other, meanwhile preserving the integrity of the gametes and individuals from harmful microbes during the process. Advances in molecular biology techniques have expanded our understanding of the natural organisms living on and in our bodies, including those inhabiting the reproductive tract. Over the past two decades accumulating evidence has shown that the human microbiome is tightly related to health and disease states involving the different body systems, including the reproductive system. Here we introduce the science involved in the study of the human microbiome. We examine common methods currently used to characterize the human microbiome as an inseparable part of the reproductive system. Finally, we consider a few limitations, clinical implications, and the critical need for additional research in the field of human fertility. Discuss: You can discuss this article with its authors and with other ASRM members at http://fertstertforum.com/mora-characterization-reproductive-microbiome/The inter-relationship between DNA and microbiology began in 1869 with the discovery of nucleic acids by Johannes Friedrich Miescher, a young Swiss physician and biochemist (1Dahm R. Friedrich Miescher and the discovery of DNA.Dev Biol. 2005; 278: 274-288Crossref PubMed Scopus (147) Google Scholar). Dr. Miescher studied pus on fresh surgical bandages, which he collected from the nearby surgical clinic. In pus, Dr. Miescher found the ideal and sufficient base material for his analyses. His discovery of the nucleotides that constitute DNA was made possible thanks to a large aggregate of bacteria together with human leukocytes. Following this discovery, for more than 100 years culture-based methods were used for isolation of microbes. Interestingly, nowadays cultivation-independent DNA sequencing methods are being used to detect colonization by microbes. Thus, evolution of molecular approaches fostered by sequencing of the genome led to a paradigm shift in understanding about microbes, the human body, DNA, and the human microbiome.The human microbiome was defined in 2001 by Joshua Lederberg, an American molecular biologist. The human microbiome may be defined as the totality of micro-organisms and their collective genetic material present in or on the human body. The introduction of cultivation-independent techniques, such as DNA sequencing, with the former knowledge derived from the classic cultivation-dependent techniques, has revealed surprising information that oftentimes contradicts what was considered dogma only a decade ago. For example, Steel et al. (2Steel J.H. Malatos S. Kennea N. Edwards A.D. Miles L. Duggan P. et al.Bacteria and inflammatory cells in fetal membranes do not always cause preterm labor.Pediatr Res. 2005; 57: 404-411Crossref PubMed Scopus (241) Google Scholar) have shown that placental tissues derived from elective, term cesarean deliveries contained organisms in 70% of placental membranes. This finding indicates that the presence of micro-organisms alone does not induce preterm labor. However, intra-amniotic culture-independent (uncultivated) bacteria were recovered from pregnant women who had confirmed histologic intra-amniotic inflammation, and subsequently, preterm birth (3Han Y.W. Shen T. Chung P. Buhimschi I.A. Buhimschi C.S. Uncultivated bacteria as etiologic agents of intra-amniotic inflammation leading to preterm birth.J Clin Microbiol. 2009; 47: 38-47Crossref PubMed Scopus (243) Google Scholar). Colonization at birth is a normal process, and van Nimwegen et al. (4van Nimwegen F.A. Penders J. Stobberingh E.E. Postma D.S. Koppelman G.H. Kerkhof M. et al.Mode and place of delivery, gastrointestinal microbiota, and their influence on asthma and atopy.J Allergy Clin Immunol. 2011; 128: 948-955.e1-3Abstract Full Text Full Text PDF PubMed Scopus (341) Google Scholar) have shown that the neonatal microbiome can differ according to the mode of delivery, and the mode of delivery can be correlated to atopic diseases later in childhood.Cultivation-dependent and -independent techniques have also broadened understanding of the spread and colonization of the normal human microbiome at different anatomic sites of the reproductive tract. For instance, the upper genital tract was previously considered to be sterile, but endometrial cultures obtained surgically at hysterectomy have demonstrated the presence of one or more micro-organisms in the uterus in nearly one-quarter of asymptomatic women examined (5Moller B.R. Kristiansen F.V. Thorsen P. Frost L. Mogensen S.C. Sterility of the uterine cavity.Acta Obstet Gynecol Scand. 1995; 74: 216-219Crossref PubMed Scopus (67) Google Scholar). Furthermore, with the usage of advanced molecular biology techniques, Mitchell et al. (6Mitchell C.M. Haick A. Nkwopara E. Garcia R. Rendi M. Agnew K. et al.Colonization of the upper genital tract by vaginal bacterial species in nonpregnant women.Am J Obstet Gynecol. 2015; 212: 611.e1-611.e9Abstract Full Text Full Text PDF PubMed Scopus (186) Google Scholar) have recently provided additional evidence that the upper genital tract in asymptomatic women is not a sterile environment. In their study, the vast majority of women (55 of 58 [95%]) tested positive for at least one species of bacteria. Using similar techniques, Aagaard et al. (7Aagaard K. Ma J. Antony K.M. Ganu R. Petrosino J. Versalovic J. The placenta harbors a unique microbiome.Sci Transl Med. 2014; 6: 237ra65Crossref PubMed Scopus (1571) Google Scholar) have demonstrated that the placenta harbors a unique low-abundance microbiome. Additionally, a recent review by Payne and Bayatibojakhi (8Payne M.S. Bayatibojakhi S. Exploring preterm birth as a polymicrobial disease: an overview of the uterine microbiome.Front Immunol. 2014; 5: 595Crossref PubMed Scopus (82) Google Scholar) summarized the evidence regarding the relationship between oral cavity bacteria and preterm birth through hematogenous spread to the placenta. These examples beg the question: what constitutes a balanced (symbiotic or commensal) microbiome, and what makes it a diseased, a parasitic, or harmful microbiome?Advanced technological tools in molecular biology have allowed researchers to "look deeper" into the microbiome world and have revealed an enormous amount of information that was not previously accessible. Technological breakthroughs, such as high-throughput methods for DNA sequencing, enabled examination of the same sources studied by Miescher, and others, with more powerful tools that reveal a deeper level of understanding and new conclusions. For example, it is now accepted that the human body contains 1013–1014 symbiotic microbial cells (9Savage D.C. Microbial ecology of the gastrointestinal tract.Annu Rev Microbiol. 1977; 31: 107-133Crossref PubMed Scopus (1651) Google Scholar), which outnumber our own body cells. Thanks to worldwide human genome and microbiome projects, we now know that there are 3.3 million microbial genes in the human gut microbiome alone (10Qin J. Li R. Raes J. Arumugam M. Burgdorf K.S. Manichanh C. et al.A human gut microbial gene catalogue established by metagenomic sequencing.Nature. 2010; 464: 59-65Crossref PubMed Scopus (7030) Google Scholar), as compared with only 20,000–25,000 protein-coding genes present in the entire human genome (11International Human Genome Sequencing ConsortiumFinishing the euchromatic sequence of the human genome.Nature. 2004; 431: 931-945Crossref PubMed Scopus (3452) Google Scholar).The fact that the human body harbors bacteria was first described by Antonie van Leeuwenhoek in the 17th century (12Dobell C. The discovery of the intestinal protozoa of man.Proc R Soc Med. 1920; 13: 1-15PubMed Google Scholar). Interestingly, the human cells themselves harbor ancient bacteria, which are the mitochondria with their mitochondrial (bacterial) DNA (13Iborra F.J. Kimura H. Cook P.R. The functional organization of mitochondrial genomes in human cells.BMC Biol. 2004; 2: 9Crossref PubMed Scopus (236) Google Scholar). Within the past decade, with the newly gathered information provided by high-throughput analyses, we begin to question what was previously considered impossible. Does the uterus have its own microbiome? Is the normal healthy fetus growing in a nonsterile environment? Is there a chance that the most common bacteria associated with chorioamnionitis are not isolable by culture? Are certain types of lactobacilli necessary for normal fecundity? These are only a few of the questions that have arisen in the field of reproduction.One of the main goals of The Human Microbiome Project, a 5-year project launched by the National Institutes of Health in 2007, was to explore the relationship between disease and the changes in the human microbiome. A central tenet is that most of the microbiome cannot be easily cultured, and therefore information collected through bacterial cultures is very limited and does not represent the actual human microbiome repertoire in normal or disease states. A relatively new way to obtain information on the microbiome is by conducting high-throughput DNA sequencing and analyses. Samples obtained from the skin, gastrointestinal tract, and vagina naturally contain human and bacterial DNA. In actual biological samples, multiple DNA strands can be sequenced simultaneously (see below). Prior knowledge about the origin or function of the DNA sequences found can provide an incredible amount of information, such as disease outbreaks, bacterial virulence, and pathogenic strains, within a relatively very short period.The nature of samples required and DNA extractionFor the purpose of DNA sequencing, biological specimens can be simply collected with a swab. There is no need for a culture medium because there is no need to keep the microbes alive. There are multiple protocols for DNA extraction. Most of the protocols contain the following steps. [1] Cell lysis to expose the DNA within the bacteria. This is done by chemical and physical methods such as detergents, blending, grinding, or sonication. [2] Removal of the membrane lipids by adding a detergent or surfactants. [3] Removal of proteins by adding a protease. [4] Removal of the RNA by adding an RNase. The microbial DNA is now free and has to be purified. Commonly used procedures for DNA purification are anion exchange phenol–chloroform extraction or silica-based strategies.Basic biology of the various molecular techniquesThe first bacterial genome was sequenced in 1995 (14Fleischmann R.D. Adams M.D. White O. Clayton R.A. Kirkness E.F. Kerlavage A.R. et al.Whole-genome random sequencing and assembly of Haemophilus influenzae Rd.Science. 1995; 269: 496-512Crossref PubMed Scopus (4643) Google Scholar). Since then, bacterial DNA databases have grown rapidly (15Medini D. Serruto D. Parkhill J. Relman D.A. Donati C. Moxon R. et al.Microbiology in the post-genomic era.Nat Rev Microbiol. 2008; 6: 419-430PubMed Google Scholar), and today's technology enables the analyses of millions of different DNA sequences obtained from a single sample. Large amounts of information can be obtained, but conducting even a targeted sequencing of specific genes in the sample may be a complex and costly endeavor. There are four commonly used techniques: fingerprinting, DNA microarrays, targeted sequencing, and whole-genome sequencing (WGS, using the Sanger and pyrosequencing techniques principles).The fingerprinting technique relies on the amplification of a specific gene, typically the bacterial ribosomal 16S ribosomal RNA (rRNA) gene (see below), or the amplification of variable number tandem repeats by polymerase chain reaction (Fig. 1). The different variants in the sample are then separated by gel electrophoresis (16Anderson I.C. Cairney J.W. Diversity and ecology of soil fungal communities: increased understanding through the application of molecular techniques.Environ Microbiol. 2004; 6: 769-779Crossref PubMed Scopus (392) Google Scholar). The differentiation among the variants is based on the different gel electrophoresis band patterns rather than the actual sequencing of the gene. Therefore, the fingerprinting technique is significantly cheaper than sequencing-based techniques, and it is useful for clustering bacteria communities according to changes in the dominant members across different samples (17Fierer N. Jackson R.B. The diversity and biogeography of soil bacterial communities.Proc Natl Acad Sci U S A. 2006; 103: 626-631Crossref PubMed Scopus (3454) Google Scholar). However, on the basis of the gel electrophoresis band patterns, only the few most abundant members of the community are detected, and therefore there is a limited range of detection (dynamic range). The advantage of sequencing over fingerprinting is a greater dynamic range. Gene sequencing, not limited to 16S rRNA, provides information with a higher resolution that enables answering questions, such as this: which specific bacterial genes or species contribute to differences among communities (including functional differences)? However, this advantage comes at a higher cost and requires more complex analyses.Targeted sequencing and DNA microarrays are two intermediate techniques that lie between fingerprinting and WGS (Fig. 1). These intermediate techniques provide a greater dynamic range than fingerprinting, at a lower cost than WGS. One of the disadvantages of the intermediate techniques in comparison with WGS is that they are based on known sequences, which makes them less useful in cases in which new lineages with no close relatives are dominant. Targeted sequencing techniques use, for example, the intergenic spacer of the rRNA (18Fisher M.M. Triplett E.W. Automated approach for ribosomal intergenic spacer analysis of microbial diversity and its application to freshwater bacterial communities.Appl Environ Microbiol. 1999; 65: 4630-4636PubMed Google Scholar) or sequencing of the V6 hypervariable region of the 16S rRNA (19Sogin M.L. Morrison H.G. Huber J.A. Mark Welch D. Huse S.M. Neal P.R. et al.Microbial diversity in the deep sea and the underexplored "rare biosphere".Proc Natl Acad Sci U S A. 2006; 103: 12115-12120Crossref PubMed Scopus (2680) Google Scholar).Deoxyribonucleic acid microarray techniques are used to screen known sequences, such as the 16S rRNA and functional gene sequence libraries (20He Z. Gentry T.J. Schadt C.W. Wu L. Liebich J. Chong S.C. et al.GeoChip: a comprehensive microarray for investigating biogeochemical, ecological and environmental processes.ISME J. 2007; 1: 67-77Crossref PubMed Scopus (448) Google Scholar, 21Wilson K.H. Wilson W.J. Radosevich J.L. DeSantis T.Z. Viswanathan V.S. Kuczmarski T.A. et al.High-density microarray of small-subunit ribosomal DNA probes.Appl Environ Microbiol. 2002; 68: 2535-2541Crossref PubMed Scopus (234) Google Scholar). Short sequences that are part of the known gene or the whole gene sequence are printed on a "chip" and ready to hybridize with the fluorescent/tagged DNA (22DeSantis T.Z. Brodie E.L. Moberg J.P. Zubieta I.X. Piceno Y.M. Andersen G.L. High-density universal 16S rRNA microarray analysis reveals broader diversity than typical clone library when sampling the environment.Microb Ecol. 2007; 53: 371-383Crossref PubMed Scopus (378) Google Scholar). There are several general types of DNA microarrays aimed at serving different goals. Complementary DNA microarrays (cDNA arrays) are used to assess expression. Comparative genomic hybridization is used to describe hybridization against an index chromosomal spread to identify the presence or absence of genes in a technique called array comparative genomic hybridization (the "whole genome microarray" technique). The probes can be the whole gene sequence, and therefore microarrays can serve as a tool for investigating variations at the genomic level (23Garaizar J. Rementeria A. Porwollik S. DNA microarray technology: a new tool for the epidemiological typing of bacterial pathogens?.FEMS Immunol Med Microbiol. 2006; 47: 178-189Crossref PubMed Scopus (48) Google Scholar). An oligonucleotide microarray, on the other hand, uses short oligonucleotides as probes and is usually used to identify single-nucleotide polymorphisms.Whole-genome sequencing is the full genome sequencing performed to discover the complete DNA sequence of an organism's genome at a single time. Both the Sanger and pyrosequencing methods (and their combination) can be used for genome sequencing (Fig. 1). The traditional Sanger method produces 650–800-bp reads with a maximum output of 0.44 Mb per run (14Fleischmann R.D. Adams M.D. White O. Clayton R.A. Kirkness E.F. Kerlavage A.R. et al.Whole-genome random sequencing and assembly of Haemophilus influenzae Rd.Science. 1995; 269: 496-512Crossref PubMed Scopus (4643) Google Scholar, 24Binnewies T.T. Motro Y. Hallin P.F. Lund O. Dunn D. La T. et al.Ten years of bacterial genome sequencing: comparative-genomics-based discoveries.Funct Integr Genomics. 2006; 6: 165-185Crossref PubMed Scopus (130) Google Scholar), whereas the Next Generation Sequencing systems are able to produce an output as high as 600 GB (gigabases=1 billion bases) per run (25Liu L. Li Y. Li S. Hu N. He Y. Pong R. et al.Comparison of next-generation sequencing systems.J Biomed Biotechnol. 2012; 2012: 251364PubMed Google Scholar).16S rRNA gene sequencingIn practice, 16S rRNA gene sequencing is mostly used for bacterial species identification. As noted above, a common strategy to detect bacterial DNA is to sequence their unique ribosomal RNA, and more specifically, their 16S rRNA gene, given that different bacteria families contain different 16S rRNA gene sequences. The 16S rRNA gene contains both fast- and slow-evolving regions, and therefore can be used for analysis of phylogenetic relationships at different taxonomic layers. When choosing the appropriate region of the 16S rRNA gene, 250-base reads are sufficient even for taxonomic assignment (26Liu Z. DeSantis T.Z. Andersen G.L. Knight R. Accurate taxonomy assignments from 16S rRNA sequences produced by highly parallel pyrosequencers.Nucleic Acids Res. 2008; 36: e120Crossref PubMed Scopus (392) Google Scholar, 27Wang Q. Garrity G.M. Tiedje J.M. Cole J.R. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy.Appl Environ Microbiol. 2007; 73: 5261-5267Crossref PubMed Scopus (12716) Google Scholar, 28Liu Z. Lozupone C. Hamady M. Bushman F.D. Knight R. Short pyrosequencing reads suffice for accurate microbial community analysis.Nucleic Acids Res. 2007; 35: e120Crossref PubMed Scopus (522) Google Scholar). However, a definition of a new bacterial phylum, or in a case for which the obtained sequences do not relate to any of the reference databases, requires longer or full-length sequencing. Currently there is no consensus on what should be the "single best region" ideal for taxonomy assignment. The 16S rRNA gene contains nine variable (V1–V9) regions (29Neefs J.M. Van de Peer Y. De Rijk P. Goris A. De Wachter R. Compilation of small ribosomal subunit RNA sequences.Nucleic Acids Res. 1991; : 1987-2015Crossref PubMed Scopus (216) Google Scholar), whereas areas surrounding V2, V4, and V6 are more commonly used for sequencing. The combination of moderately conserved and variable regions is likely to be the optimal approach for performing analyses at different phylogenetic resolutions. Along with choosing the appropriate region(s), primer design and/or selection are critical. Improper selection of primers may result in underrepresentation of bacterial communities and/or reaching different biological conclusions (26Liu Z. DeSantis T.Z. Andersen G.L. Knight R. Accurate taxonomy assignments from 16S rRNA sequences produced by highly parallel pyrosequencers.Nucleic Acids Res. 2008; 36: e120Crossref PubMed Scopus (392) Google Scholar, 30Andersson A.F. Lindberg M. Jakobsson H. Backhed F. Nyren P. Engstrand L. Comparative analysis of human gut microbiota by barcoded pyrosequencing.PLoS One. 2008; 3: e2836Crossref PubMed Scopus (750) Google Scholar).Bioinformatics requirements and modelingThe sequencing platforms in common use are the Roche 454 Genome Sequencer FLX and FLX Titanium technologies (pyrosequencing based), and the Illumina HiSeq 2500 and MiSeq platforms. Although the 454 FLX technologies have higher error rates and lower read numbers per run than the Illumina technologies, they have the advantage of greater read lengths (up to approximately 1,000 bp per read for FLX, versus up to approximately 200–300-bp paired-end reads for HiSeq 2500, depending on the run type and version number). The Illumina MiSeq v3 platform can produce approximately 600-bp paired-end reads but at a lower number of reads per colony compared with HiSeq 2500. The Illumina technologies give higher sequencing depths (i.e., more reads per colony) at lower cost per GB of sequence. Read lengths can affect resolution of taxonomic analyses (see below), so there is a tradeoff between accuracy, sequencing depth, and associated costs that are considered in selecting the sequencing platform. Commercial sequencing laboratories frequently offer the choice of multiple platforms.Once sequences are collected, they are analyzed to characterize community composition regarding genera and species present, as well as their proportions within the sampled community. Sequences are analyzed and organized into operational taxonomic units (OTUs) that provide working names or designations for groups of detected 16S sequences by searching against publicly available taxonomic databases, such as Greengenes, SILVA, and the Ribosomal Database Project. This approach, sometimes referred to as "closed OTU picking," is sufficient for matching to known rRNA gene sequences of previously characterized organisms, but as stated above, it is problematic for novel sequences representing uncharacterized organisms. Because these databases are routinely updated, investigators' ability to use them for accurate taxonomic evaluations of microbial communities will continue to improve.Alternatively, sequences may be analyzed by computational clustering of amplicon sequences into OTUs, sometimes referred to as "de novo OTU picking." For this approach, software that utilizes some variety of statistical learning algorithm to cluster sequences into OTUs is used. In these methods, two-dimensional distance matrices are populated with sequences, and the degrees of similarity between each pair of sequences is calculated on the basis of sequence alignments. Clustering analyses (supervised or unsupervised, depending on prior knowledge about the biological samples) are then performed, and degrees of relatedness of different sequences are based on similarity scores. Because each sequence is compared with every other sequence in this process, the computational complexity is quadratic and grows with the square of the size of the number of sequences being analyzed (sequencing depth). Improvements in the method have utilized various data preprocessing and filtering steps to reduce the number of alignments required. Additionally, "greedy" clustering algorithms have been used that reduce computational complexity. Frequently, multiple clustering methods are employed. Noise reduction methods are introduced to reduce the number of polymerase chain reaction and sequencing errors that lead to inflation in the numbers of OTUs calculated. Calculated OTUs are then used to infer species and community composition and diversity. Some computational pipelines combine closed and de novo OTU picking methods. Owing to the very large volumes of data involved, statistical machine learning methods, such as principal components analysis or principal coordinates analysis, are usually used for dimensionality reduction to facilitate the visualization of community compositional differences between biological samples. Rarefaction techniques are used to estimate the richness of microbial species in a community or sample—rarefaction curves plot the number of species detected as a function of the number of samples assessed—or number of OTUs calculated per number of sequences analyzed.New and improved procedures for every step in data processing and analysis are constantly evolving and are made publicly available, so there is an array of software choices to consider for inclusion in one's analysis pipeline. Particularly useful in this regard are routinely updated software packages that include programs for each step in the process. QIIME (Quantitative Insights Into Microbial Ecology), an open-source Python-scripted framework, is currently popular. QIIME can be run on Mac and Linux systems and in virtual machines on Windows systems. Additionally, machine images are available to run QIIME in an Elastic Compute Cloud (EC2) instance on Amazon Web Services, allowing distributed execution with configurable compute capacity. This can be a relatively low-cost option for laboratories with limited access to genomics workstations or scientific compute clusters. Mothur is a similar popularly used framework. Both have helpful online fora and usage tutorials.For whole-genome shotgun sequencing, microbial genomic DNA purified from biological samples is mechanically sheared, tagged, and sequenced. The same high-throughput, next-generation sequencing platforms are used as described above. Reads are algorithmically assembled into progressively larger contigs and scaffolds. As noted above, this approach provides information about which organisms are present, as well as potential metabolic profiles of the sampled communities, based on functions of genes detected. Because fewer amplification steps are used for genomic DNA sequencing, amplification-associated biases are less significant. However, some of the same error considerations still apply, owing to relative accuracies of different sequencing chemistries (e.g., pyrosequencing or Nanopore sequencing), DNA extraction biases, and sample complexity. Other factors influence the inherent difficulty and reliability of genome assembly. Misassemblies frequently result from the occurrence of repetitive DNA sequences in bacterial genomes and from the attribution of sequences from more than one species to chimeric contigs that do not exist. Construction of assemblies can be accomplished by comparison with pre-existing reference genomes and software for this purpose, including Newbler, AMOS (31Treangen T.J. Sommer D.D. Angly F.E. Koren S. Pop M. Next generation sequence assembly with AMOS.Curr Protoc Bioinformatics. 2011; (Chapter 11:Unit 11.8)PubMed Google Scholar), and MetAmos (32Treangen T.J. Koren S. Sommer D.D. Liu B. Astrovskaya I. Ondov B. et al.MetAMOS: a modular and open source metagenomic assembly and analysis pipeline.Genome Biol. 2013; 14: R2Crossref PubMed Scopus (134) Google Scholar). De novo assembly algorithms use graph traversal algorithms to determine assembly solutions in which nodes represent reads or k-mers derived from reads (where k is an arbitrarily assigned integer) and edges represent directed overlaps between nodes. Populated graphs can be enormous, and therefore simplification methods must be used to remove redundant information.In recent years assembly algorithms use novel data structures to reduce computational complexity, thereby making overlap-based assembly feasible for much larger data sets (33Simpson J.T. Durbin R. Efficient construction of an assembly string graph using the FM-index.Bioinformatics. 2010; 26: i367-i373Crossref PubMed Scopus (168) Google Scholar). Sources of ambiguity for these analyses include the occurrence of regions of sequence similarity between different genomes, variations in sequencing depth between different regions of the same genome, repetitive sequences, and the occurrence of localized sequence polymorphisms between the genomes of closely related bacterial species.Software using de novo assembly algorithms include Abyss (34Simpson J.T. Wong K. Jackman S.D. Schein J.E. Jones S.J. Birol I. ABySS: a parallel assembler for short read sequence data.Genome Res. 2009; 19: 1117-1123Crossref PubMed Scopus (2539) Google Scholar), SOAPdenovo (35Li R. Zhu H. Ruan J. Qian W. Fang X. Shi Z. et al.De novo assembly of human genomes with massively parallel short read sequencing.Genome Res. 2010; 20: 265-272Crossref PubMed Scopus (2176) Google Scholar), Velvet (36Zerbino D.R. Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs.Genome Res. 2008; 18: 821-829Crossref PubMed Scopus (7295) Google Scholar), MetaVelvet (37Namiki T. Hachiya T. Tanaka H. Sakakibara Y. MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads.Nucleic Acids Res. 2012; 40: e155Crossref PubMed Scopus (410) Google Scholar), MetaVelvet-SL (38Afiahayati Sato K. Sakakibara Y. MetaV
Referência(s)