The impact of next-generation sequencing technology on genetics
2008; Elsevier BV; Volume: 24; Issue: 3 Linguagem: Inglês
10.1016/j.tig.2007.12.007
ISSN1362-4555
Autores Tópico(s)Molecular Biology Techniques and Applications
ResumoIf one accepts that the fundamental pursuit of genetics is to determine the genotypes that explain phenotypes, the meteoric increase of DNA sequence information applied toward that pursuit has nowhere to go but up. The recent introduction of instruments capable of producing millions of DNA sequence reads in a single run is rapidly changing the landscape of genetics, providing the ability to answer questions with heretofore unimaginable speed. These technologies will provide an inexpensive, genome-wide sequence readout as an endpoint to applications ranging from chromatin immunoprecipitation, mutation mapping and polymorphism discovery to noncoding RNA discovery. Here I survey next-generation sequencing technologies and consider how they can provide a more complete picture of how the genome shapes the organism. If one accepts that the fundamental pursuit of genetics is to determine the genotypes that explain phenotypes, the meteoric increase of DNA sequence information applied toward that pursuit has nowhere to go but up. The recent introduction of instruments capable of producing millions of DNA sequence reads in a single run is rapidly changing the landscape of genetics, providing the ability to answer questions with heretofore unimaginable speed. These technologies will provide an inexpensive, genome-wide sequence readout as an endpoint to applications ranging from chromatin immunoprecipitation, mutation mapping and polymorphism discovery to noncoding RNA discovery. Here I survey next-generation sequencing technologies and consider how they can provide a more complete picture of how the genome shapes the organism. First described by Sanger et al. in 1977 [1Sanger F. et al.DNA sequencing with chain-terminating inhibitors.Proc. Natl. Acad. Sci. U. S. A. 1977; 74: 5463-5467Crossref PubMed Scopus (52511) Google Scholar], dideoxynucleotide sequencing of DNA has undergone a steady metamorphosis from a cottage industry into a large-scale production enterprise that requires a specialized and devoted infrastructure of robotics, bioinformatics, computer databases and instrumentation. In the process of its metamorphosis, the cost per reaction of DNA sequencing has fallen with a Moore's Law [2Moore G. Cramming more components onto integrated circuits.Electronics. 1965; 38Google Scholar] (ftp://download.intel.com/research/silicon/moorespaper.pdf) precision (Moore's Law describes the trend in the history of computer hardware, whereby the number of transistors that can be placed on an integrated circuit increases exponentially, doubling approximately every 2 years). This phenomenon has been especially true in the last 5 years, largely because of efforts necessary to sequence the human genome. Although still conducted at a subsistence level in the single investigator, departmental or university core facility setting, high-throughput DNA sequencing is so specialized that it is performed in a handful of sites (i.e. http://genome.wustl.edu, http://www.broad.mit.edu/, http://www.hgsc.bcm.tmc.edu/, http://www.sanger.ac.uk/). However, just as state-of-the-art high-throughput DNA sequencing seemed to be reaching its zenith at these sequencing centers, several new sequencing instruments (so-called ‘next generation’ or ‘massively parallel’) are becoming available and already are transforming the field. Their impact on genomics is in turn causing a revolution in genetics that, because of a variety of factors, will fundamentally change the nature of genetic experimentation. When coupled with the appropriate computational algorithms, our ability to answer questions about the mutational spectrum of an organism, from single base to large copy number polymorphisms, on a genome-wide scale, is likely to radically alter our understanding of model organisms and ultimately of ourselves. Here, I review a subset of the studies that have been enabled by next-generation sequencing platforms, to gain an appreciation of the breadth and depth of their potential. What is it that sets next-generation sequencers apart from conventional capillary-based sequencing? Namely, the ability to process millions of sequence reads in parallel rather than 96 at a time. This massively parallel throughput may require only one or two instrument runs to complete an experiment. Also, next generation sequence reads are produced from fragment ‘libraries’ that have not been subject to the conventional vector-based cloning and Escherichia coli–based amplification stages used in capillary sequencing. As such, some of the cloning bias issues that impact genome representation in sequencing projects may be avoided, although each sequencing platform may have its own associated biases. The workflow to produce next-generation sequence-ready libraries is straightforward; DNA fragments that may originate from a variety of front-end processes (described below) are prepared for sequencing by ligating specific adaptor oligos to both ends of each DNA fragment. Importantly, relatively little input DNA (a few micrograms at most) is needed to produce a library. These platforms also have the ability to sequence the paired ends of a given fragment, using a slightly modified library process. This approach can be used if a de novo genome sequence is to be assembled from the next-generation data, for example. Finally, next-generation sequencers produce shorter read lengths (35–250 bp, depending on the platform) than capillary sequencers (650–800 bp), which also can impact the utility of the data for various applications such as de novo assembly and genome resequencing (Box 1). Because they are so new, the accuracy of their sequencing reads and associated quality values are not yet well understood, although many laboratories have efforts underway to benchmark them relative to capillary electrophoresis. Aside from these generally shared features, the three commercially available sequencers differ significantly (see Table 1 for a comparison of current specifications) and are described below.Box 1The ‘$1000’ genome – toward personalized genomics?Next-generation sequencing technologies, by enabling vast data generation, will provide a comprehensive picture of normal human genome variation in the next few years. This will set the baseline by which genome variation in a genetic disease cohort can be evaluated. Efforts to couple the discovered variations to the disease biology will provide functional annotations for gene variants that predict disease susceptibility and genetic risk factors and provide pharmacokinetic profiles. Targeted treatments might also be suggested that selectively block the impact of certain variants. At this point, inexpensive (e.g. $1000) genome sequencing as a clinical assay or a point of entry to health insurance or medical care becomes meaningful. However, the current cost of resequencing a human genome is high, even with next-generation technology. One reason is that short reads (e.g. 35–50 bases) likely will require ∼25- to 30-fold oversampling, or ‘coverage’ of the genome, to ensure that both chromosomal pairs (haplotypes) are sampled sufficiently to capture all the genetic information. At 3Gb per run, 25–30 instrument runs would be required (the human genome is ∼3 Gb) to provide this coverage, costing ∼$700 000 per genome. In addition to requiring more coverage, short reads also are limited in the power to detect sequence variation in the genome, based on their uniqueness. For example, if a given 32-base sequence is found more than once in the reference sequence, that sequence is eliminated as a potential site of variant detection because of uncertainty in aligning a 32-base sequence read (even one that contains one or more potentially variant bases) at the correct genomic location.Table 1Comparing metrics and performance of next-generation DNA sequencersPlatformRoche(454)IlluminaSOLiDSequencing chemistryPyrosequencingPolymerase-based sequencing-by-synthesisLigation-based sequencingAmplification approachEmulsion PCRBridge amplificationEmulsion PCRPaired ends/separationYes/3 kbyes/200 bpYes/3 kbMb/run100 Mb1300 Mb3000 MbTime/run (paired ends)7 h4 days5 daysRead length250 bp32–40 bp35 bpCost per run (total directaTotal direct costs include the reagents and consumables, the labor, instrument amortization cost and the disc storage space required for data storage/access.)$8439$8950$17 447Cost per Mb$84.39$5.97$5.81a Total direct costs include the reagents and consumables, the labor, instrument amortization cost and the disc storage space required for data storage/access. Open table in a new tab Next-generation sequencing technologies, by enabling vast data generation, will provide a comprehensive picture of normal human genome variation in the next few years. This will set the baseline by which genome variation in a genetic disease cohort can be evaluated. Efforts to couple the discovered variations to the disease biology will provide functional annotations for gene variants that predict disease susceptibility and genetic risk factors and provide pharmacokinetic profiles. Targeted treatments might also be suggested that selectively block the impact of certain variants. At this point, inexpensive (e.g. $1000) genome sequencing as a clinical assay or a point of entry to health insurance or medical care becomes meaningful. However, the current cost of resequencing a human genome is high, even with next-generation technology. One reason is that short reads (e.g. 35–50 bases) likely will require ∼25- to 30-fold oversampling, or ‘coverage’ of the genome, to ensure that both chromosomal pairs (haplotypes) are sampled sufficiently to capture all the genetic information. At 3Gb per run, 25–30 instrument runs would be required (the human genome is ∼3 Gb) to provide this coverage, costing ∼$700 000 per genome. In addition to requiring more coverage, short reads also are limited in the power to detect sequence variation in the genome, based on their uniqueness. For example, if a given 32-base sequence is found more than once in the reference sequence, that sequence is eliminated as a potential site of variant detection because of uncertainty in aligning a 32-base sequence read (even one that contains one or more potentially variant bases) at the correct genomic location. First commercially introduced in 2004, this sequencer works on the principle of ‘pyrosequencing’, which uses the pyrophosphate molecule released on nucleotide incorporation by DNA polymerase to fuel a downstream set of reactions that ultimately produces light from the cleavage of oxyluciferin by luciferase [3Margulies M. et al.Genome sequencing in microfabricated high-density picolitre reactors.Nature. 2005; 437: 376-380Crossref PubMed Scopus (5946) Google Scholar] (Figure 1). Instead of sequencing in discrete tubes or in microtiter plate wells, the DNA strands of the library are amplified en masse by emulsion PCR [4Dressman D. et al.Transforming single DNA molecules into fluorescent magnetic particles for detection and enumeration of genetic variations.Proc. Natl. Acad. Sci. U. S. A. 2003; 100: 8817-8822Crossref PubMed Scopus (656) Google Scholar] on the surfaces hundreds of thousands of agarose beads. The surfaces of these beads have millions of oligomers attached to them, each of which is complementary to the adaptor sequences that were ligated to the fragment ends during library construction (as described above). Emulsion PCR uses a vigorously mixed oil and aqueous mixture to isolate individual agarose beads, each having a single unique DNA fragment hybridized to the oligo-decorated surface, in aqueous micelles that also contain the PCR reactants. By pipetting these tiny micelles into the wells of a conventional microtiter plate and performing temperature cycling, one can produce >1 000 000 sequence-ready 454 beads in a matter of hours. Each agarose bead surface contains up to 1 000 000 copies of the original annealed DNA fragment to produce detectable signal from the sequencing reaction. Several hundred thousand such beads (each containing a unique amplified fragment) are added to the surface of the 454 picotiter plate (PTP), which consists of single wells in the tips of fused fiber optic strands that hold each bead. Subsequently, much smaller magnetic and latex beads of 1 μm diameter, which are attached to the active enzymes needed for pyrosequencing, are added to surround the DNA-containing agarose beads in the PTP. Because the PTP also becomes the cell through which the pyrosequencing reactants flow, it is placed in the sequencer, and nucleotide and reagent solutions are delivered into it in a sequential fashion. Imaging of the light flashes from luciferase activity records which templates are adding that particular nucleotide (see http://www.454.com/enabling-technology/the-technology.asp for an explanation of the workflow and technology), and the light emitted is directly proportional to the amount of a particular nucleotide incorporated (up to the level of detector saturation). Hence, for runs of multiple nucleotides (homopolymers), the linearity of response can exceed the detector sensitivity, at which indel errors can occur in those reads. That said, the sequential flow of nucleotides almost entirely precludes the occurrence of substitution errors in the sequences. Placing a defined single nucleotide pattern in the adaptor sequence that matches the sequence of the first four nucleotide flows enables the 454 analysis software to calibrate the level of light emitted from a single nucleotide incorporation, for purposes of downstream base-calling analysis that occurs after the sequencer run is completed. Here, the signals recorded during the run for each reporting bead position on the PTP are translated into a sequence read, and several quality-checking steps remove poor quality sequences. The current 454 instrument, the GS-FLX, produces an average read length of ∼250 bp per sample (per bead), with a combined throughput of ∼100 Mb of sequence data per 7-h run. By contrast, a single ABI 3730 programmed to sequence 24 × 96-well plates per day produces ∼440 kb of sequence data in 7 h, with an average read length of 650 bp per sample. Introduced in 2006, the Illumina Genome Analyzer is based on the concept of ‘sequencing by synthesis’ (SBS) to produce sequence reads of ∼32–40 bp from tens of millions of surface-amplified DNA fragments simultaneously (Figure 2). Starting from a mixture of single-stranded, adaptor oligo-ligated DNA fragments, the Illumina process involves using a microfluidic cluster station to add these fragments to the surface of a glass flow cell. Each flow cell is divided into eight separate lanes, and the interior surfaces have covalently attached oligos complementary to the specific adapters that are ligated onto the library fragments. Hybridization of these DNAs to the oligos on the flow cell occurs by an active heating and cooling step, followed by a subsequent incubation with reactants and an isothermal polymerase that amplifies the fragments in a discrete area or ‘cluster’ on the flow cell surfaces (see http://www.illumina.com/pages.ilmn?ID=203 for an animation of this process). The flow cell is placed into a fluidics cassette within the sequencer, where each cluster is supplied with polymerase and four differentially labeled fluorescent nucleotides that have their 3′-OH chemically inactivated to ensure that only a single base is incorporated per cycle. Each base incorporation cycle is followed by an imaging step to identify the incorporated nucleotide at each cluster and by a chemical step that removes the fluorescent group and deblocks the 3′ end for the next base incorporation cycle. At the end of the sequencing run (∼4 days), the sequence of each cluster is computed and subjected to quality filtering to eliminate low-quality reads of between 32 and 40 bp (as specified by the user). A typical run yields ∼40–50 million such sequences. This instrument, which achieved commercial release in October 2007, uses a unique sequencing process catalyzed by DNA ligase. Each SOLiD (Sequencing by Oligo Ligation and Detection) run requires ∼5 days and produces 3–4 Gb of sequence data with an average read length of 25–35 bp. The specific process couples oligo adaptor-linked DNA fragments with 1-μm magnetic beads that are decorated with complementary oligos and amplifies each bead–DNA complex by emulsion PCR. After amplification, the beads are covalently attached to the surface of a specially treated glass slide that is placed into a fluidics cassette within sequencer. In the SOLiD system, two slides are processed per run; one slide receives sequencing reactants as the second slide is being imaged. The ligation-based sequencing process starts with the annealing of a universal sequencing primer that is complementary to the SOLiD-specific adapters on the library fragments. The addition of a limited set of semi-degenerate 8mer oligonucleotides and DNA ligase is automated by the instrument. When a matching 8mer hybridizes to the DNA fragment sequence adjacent to the universal primer 3′ end, DNA ligase seals the phosphate backbone. After the ligation step, a fluorescent readout identifies the fixed base of the 8mer, which corresponds to either the fifth position or the second position, depending on the cycle number (see Table 2 for details). A subsequent chemical cleavage step removes the sixth through eighth base of the ligated 8mer by attacking the linkage between bases 5 and 6, thereby removing the fluorescent group and enabling a subsequent round of ligation. The process occurs in steps that identify the sequence of each fragment at five nucleotide intervals (Table 2), and the synthesized fragments that end at base 25 (or 35 if more cycles are performed) are removed by denaturation and washed away. A second round of sequencing initiates with the hybridization of an n-1 positioned universal primer, and subsequent rounds of ligation-mediated sequencing, and so on. An overview of the SOLiD workflow is presented at http://marketing.appliedbiosystems.com/images/Product/Solid_Knowledge/flash/102207/solid.html. The unique attribute of this ligation-based approach and the 8mer labeling is that an extra quality check of read accuracy is enabled, so-called ‘2 base encoding’. This essentially relies on the known fixed nucleotide identities in the 8mer sequences to identify mis-calls from true nucleotide differences during the data analysis step (see Figure 3 for details).Table 2AB SOLiD cycle number descriptionsCycle numberUniversal primer positionBase positions identifiedProbe seta^, position of cleavage on each 8mer, whereas fl indicates the position of the fluorescent group on the 8mer.Positions interrogated1n4,5NNNAA^NNN-fl5,10,15,20,252n-14,5NNNAT^NNN-fl4,9,14,19,243n-24,5NNNAC^NNN-fl3,8,13,18,234n1,2AANNN^NNN-fl2,7,12,17,225n-11,2ATNNN^NNN-fl1,6,11,16,21a ^, position of cleavage on each 8mer, whereas fl indicates the position of the fluorescent group on the 8mer. Open table in a new tab Given the amount of data that can be produced from a single run and the cost per run ($5–$85 per Mb of sequence; see Table 2 for comparisons) for these instruments, one can envision a diverse range of applications. The following sections profile recently published studies that used these next-generation sequencers for applications including mutation discovery, metagenomic characterization, noncoding RNA and DNA–protein interaction discovery. The discovery of mutations that determine phenotypes is a fundamental premise of genetic research and will be tremendously facilitated by next-generation sequencing approaches, both for focused and genome-wide discovery. Conventional approaches to focused mutation discovery have used directed PCR to amplify selected genomic regions from individual samples, followed by capillary sequencing, alignment of the resulting sequence traces and algorithmic detection of sequence variants [5Nickerson D.A. et al.Sequence-based detection of single nucleotide polymorphisms.Methods Mol. Biol. 2001; 175: 29-35PubMed Google Scholar, 6Pao W. et al.EGF receptor gene mutations are common in lung cancers from ‘never smokers’ and are associated with sensitivity of tumors to gefitinib and erlotinib.Proc. Natl. Acad. Sci. U. S. A. 2004; 101: 13306-13311Crossref PubMed Scopus (3861) Google Scholar, 7Wilson R.K. et al.Mutational profiling in the human genome.Cold Spring Harb. Symp. Quant. Biol. 2003; 68: 23-29Crossref PubMed Google Scholar, 8Wood L.D. et al.The genomic landscapes of human breast and colorectal cancers.Science. 2007; 318: 1108-1113Crossref PubMed Scopus (2558) Google Scholar, 9Paez J.G. et al.EGFR mutations in lung cancer: correlation with clinical response to gefitinib therapy.Science. 2004; 304: 1497-1500Crossref PubMed Scopus (8433) Google Scholar]. The PCR products can instead be sequenced directly using the Roche (454) sequencer as published by Thomas et al. [10Thomas R.K. et al.Sensitive mutation detection in heterogeneous cancer specimens by massively parallel picoliter reactor sequencing.Nat. Med. 2006; 12: 852-855Crossref PubMed Scopus (293) Google Scholar], who showed the high sensitivity of this platform to detect rare variants and to alleviate noisy capillary sequence data resulting from contaminating normal cells in tumor samples. A similar study by Dahl et al. [11Dahl F. et al.Multigene amplification and massively parallel sequencing for cancer mutation discovery.Proc. Natl. Acad. Sci. U. S. A. 2007; 104: 9387-9392Crossref PubMed Scopus (147) Google Scholar] emphasized the value of this approach for highly sensitive variant detection. Recently, Porreca et al. [12Porreca G.J. et al.Multiplex amplification of large sets of human exons.Nat. Methods. 2007; 4: 931-936Crossref PubMed Scopus (355) Google Scholar] published an extreme example of this approach by multiplexing the amplification of 10 000 human exons using primers released from a programmable microarray and sequencing them using a massively parallel approach. An interesting variation of the PCR-directed approaches is exemplified in recent work that selects regions from the genome by microarray-based capture technology [13Albert T.J. et al.Direct selection of human genomic loci by microarray hybridization.Nat. Methods. 2007; 4: 903-905Crossref PubMed Scopus (537) Google Scholar, 14Hodges E. et al.Genome-wide in situ exon capture for selective resequencing.Nat. Genet. 2007; 39: 1522-1527Crossref PubMed Scopus (536) Google Scholar]. Once the target sequences are reclaimed from the array by denaturation and amplified, they can be directly sequenced for mutation detection. Although such approaches pose great potential for mutation discovery, they are limited for complex and repetitive genomes (such as human) because of the inability to design specific primers or capture probes. Aside from a directed focus on select regions of a genome of interest, whole genome resequencing for variant discovery is significantly faster and less expensive using next-generation sequencers than with conventional approaches. Although there are some limitations imposed on this approach by the short read lengths these technologies deliver (Box 1), one can readily discover mutations genome-wide with a single instrument run or a portion thereof (depending on genome size). For example, recent work in our group to discover single nucleotide polymorphisms and small (1–2 bp) indels in a Caenorhabditis elegans strain (CB 4858) required only a single run of the Illumina sequencer (Hillier et al., unpublished data). The following section expands on mutation discovery efforts using next-generation sequencing in bacterial and viral isolates. Complete genome sequences are available for many disease-causing bacteria and viruses or for their laboratory strain equivalents (many of which are nonvirulent). Because the nature of such pathogens is to evolve continually by mutation and by exchanging sequences with one another, sequencing clinical isolates is of interest, especially if rapid data about antibiotic susceptibility and/or resistance and other virulence markers can be obtained. One clear benefit of all next-generation platforms for strain-to-reference sequencing is that each DNA sequence in a library is obtained from a single genomic fragment, such that if there are rare variants in the clinical strain population, these can be detected by virtue of the depth of sampling obtained. By contrast, this is not possible when sequencing PCR products directly obtained from a clinical sample, as is commonly done in a clinical diagnostic setting, because the low signal strength from variant nucleotides would not be detectable on a capillary sequencer. Another benefit of obviating the conventional bacterial cloning intermediate is that the cloning bias often introduced during passage of foreign sequences through a bacterial host is eliminated. A typical project of this type involves culturing or otherwise isolating the microbe of interest and performing massively parallel sequencing of the isolate using one the approaches described above, followed by a bioinformatics-based approach to (i) align the sequence reads back to the reference genome(s); (ii) evaluate them for single nucleotide and/or indel variants and detect the presence of antibiotic resistance genes or pathogenicity islands by comparison of novel sequences to those in the public databases and (iii) evaluate any discovered variation in a functional and a biological context. Because the 454 platform has a read length appropriate for sequence assembly and a library construction approach that alleviates cloning bias, several studies using this approach have been published. In particular, two groups have produced increasingly sophisticated applications of HIV clinical isolate sequencing that identify rare members of the viral population [15Hoffmann C. et al.DNA bar coding and pyrosequencing to identify rare HIV drug resistance mutations.Nucleic Acids Res. 2007; 35: e91Crossref PubMed Scopus (187) Google Scholar, 16Wang C. et al.Characterization of mutation spectra with ultra-deep pyrosequencing: application to HIV-1 drug resistance.Genome Res. 2007; 17: 1195-1201Crossref PubMed Scopus (352) Google Scholar], and that identify HIV integration sites in the host genome [17Wang G.P. et al.HIV integration site selection: analysis by massively parallel pyrosequencing reveals association with epigenetic modifications.Genome Res. 2007; 17: 1186-1194Crossref PubMed Scopus (343) Google Scholar]. Further examples of the strain-to-reference application include the work of Francois et al. [18Francois P. et al.Genome content determination in methicillin-resistant Staphylococcus aureus.Future Microbiol. 2007; 2: 187-198Crossref PubMed Scopus (8) Google Scholar], who sequenced and analyzed the genomes of methicillin-resistant Staphylococcus aureus clinical isolates using the 454 platform, and of Poly et al. [19Poly F. et al.Genome sequence of a clinical isolate of Campylobacter jejuni from Thailand.Infect. Immun. 2007; 75: 3425-3433Crossref PubMed Scopus (55) Google Scholar], who sequenced a clinical isolate of Campylobacter jejuni. Indeed, Denno et al. [20Denno D.M. et al.Explaining unexplained diarrhea and associating risks and infections.Anim. Health Res. Rev. 2007; 8: 69-80Crossref PubMed Scopus (24) Google Scholar] have suggested that the study of the etiology of unexplained diarrhea should include the application of massively parallel sequencing to identify bacterial and viral species as potential causative agents. A remarkable study by an international consortium used 454 sequencing of Mycobacterium tuberculosis to identify drug targets of a diarylquinoline drug that potently inhibited both drug-sensitive and -resistant strains of the pathogen [21Andries K. et al.A diarylquinoline drug active on the ATP synthase of Mycobacterium tuberculosis.Science. 2005; 307: 223-227Crossref PubMed Scopus (1703) Google Scholar]. Based on these early reports, it is quite likely that our understanding of the spectrum of genome variation within clinical isolates will be greatly enhanced in the near future. This knowledge, in turn, should lead to improved diagnostics, monitoring and treatments. Although sequencing a single clinical or cultured isolate of a microbe or virus seems a straightforward application of next-generation platforms, the throughput capability of these instruments has had a significant impact on a related and powerful field of endeavor known as ‘metagenomics’. Metagenomics essentially entails brute force sequencing of DNA fragments obtained from an uncultured, unpurified microbial and/or viral population, followed by bioinformatics-based analyses that attempt to answer the question ‘Who's there?’ by comparing the metagenomic sequences obtained with all other sequenced species and isolates. Because these sequence reads are present in rough proportion to the population frequency of each microbe, inferences about relative abundance can be made. Although metagenomic studies have been accomplished with sequencing data from conventional capillary platforms, the associated cost was (and remains) prohibitive to sequence deeply into highly complex populations. Furthermore, the need for cloning before sequencing eliminates the metagenomic signatures from certain microbial and bacteriophage sequences that are not carried stably by E. coli. As such, the most definitive early metagenomic studies were of restricted populations that came from hostile environments [22Tyson G.W. et al.Community structure and metabolism through reconstruction of microbial genomes from the environment.Nature. 2004; 428: 37-43Crossref PubMed Scopus (1655) Google Scholar]. Humans live in symbiosis with billions of microbial species that inhabit both the outer and inner surf
Referência(s)