Sequential sequencing by synthesis and the next-generation sequencing revolution
2023; Elsevier BV; Volume: 41; Issue: 12 Linguagem: Inglês
10.1016/j.tibtech.2023.06.007
ISSN0167-9430
AutoresMathias Uhlén, Stephen R. Quake,
Tópico(s)Microbial Community Ecology and Physiology
ResumoNext-generation sequencing (NGS) involving massively parallel DNA analysis has made an enormous impact on life science, medicine, and biotechnology, through a multitude of applications.The vast majority of the major NGS systems are based on the concept of 'sequencing by synthesis' (SBS) with sequential detection of nucleotide incorporation using an engineered DNA polymerase.The basic principles of SBS include attachment of DNA fragments to a solid support, conversion to a single-strand template and the annealing of a primer, the incorporation of complementary nucleotides by a polymerase, and detection of this incorporation.The development of NGS spans several decades of innovations, from early systems using natural nucleotides to later systems for massively parallel sequencing systems using reversible fluorescent nucleotides. There is no question that massively parallel sequencing (MPS) technology, often referred to as NGS, has had a unique and huge impact on life science research [1.Gibbs R.A. The Human Genome Project changed everything.Nat. Rev. Genet. 2020; 21: 575-576Crossref PubMed Scopus (54) Google Scholar,2.McGinn S. Gut I.G. DNA sequencing – spanning the generations.New Biotechnol. 2013; 30: 366-372Crossref PubMed Scopus (0) Google Scholar], with a large number of studies using this technology currently published every day. The number of DNA sequences in public databases has exploded since MPS by synthesis was commercially introduced in 2005 [3.Margulies M. et al.Genome sequencing in microfabricated high-density picolitre reactors.Nature. 2005; 437: 376-380Crossref PubMed Scopus (6037) Google Scholar]. This has also led to a dramatic decrease in cost and efforts to sequence whole genomes. This is illustrated by the estimated cost for the first human genome published in 2001 using electrophoretic technology for base calling, which was estimated to be US$2–3 billion, while today it is possible to sequence a human genome with MPS for less than US$1000 [1.Gibbs R.A. The Human Genome Project changed everything.Nat. Rev. Genet. 2020; 21: 575-576Crossref PubMed Scopus (54) Google Scholar], a cost reduction by a factor of over 1 million. This is remarkable and it is hard to find a similar example in scientific history. This has led to an explosion of scientific data and the entrance of a new era in medicine and biology driven by 'big data' and data-driven research. In the field of genetics, the creation of maps covering the genetic diversity in various human populations [4.Gudbjartsson D.F. et al.Sequence variants from whole genome sequencing a large group of Icelanders.Sci. Data. 2015; 2150011Crossref Scopus (49) Google Scholar] has greatly increased our understanding of the relationship of genes and diseases. Similar maps across the 'tree of life' [5.Hug L.A. et al.A new view of the tree of life.Nat. Microbiol. 2016; 1: 16048Crossref PubMed Google Scholar] have increased our understanding of the parts list of human building blocks. The discovery of a new human ancestor, the Denisovans [6.Meyer M. et al.A high-coverage genome sequence from an archaic Denisovan individual.Science. 2012; 338: 222-226Crossref PubMed Scopus (1185) Google Scholar], was enabled by NGS, and population studies showed widespread remains of both Neanderthal and Denisovan DNA in our genomes. The technology has also led to clinical practice, both to understand and to treat cancers [7.Mardis E.R. Wilson R.K. Cancer genome sequencing: a review.Hum. Mol. Genet. 2009; 18: R163-R168Crossref PubMed Scopus (162) Google Scholar], but also to allow diagnosis in children with unknown disease-causing mutations, resulting in the design of drug treatments [8.Stranneheim H. et al.Integration of whole genome sequencing into a healthcare setting: high diagnostic rates across multiple clinical entities in 3219 rare disease patients.Genome Med. 2021; 13: 40Crossref PubMed Scopus (61) Google Scholar,9.Marwaha S. et al.A guide for the diagnosis of rare and undiagnosed disease: beyond the exome.Genome Med. 2022; 14: 23Crossref PubMed Scopus (37) Google Scholar]. In the following, the various concepts enabling the rapid development of NGS are discussed. The objective of SBS is to determine the sequencing of a DNA sample by detecting in a sequential manner the incorporation of nucleotides using an engineered DNA polymerase (Figure 1). An engineered polymerase is used to synthesize a copy of a single strand of DNA and the incorporation of each nucleotide is monitored. The key parts are highly similar for all embodiments of SBS and include the following:(i)attachment of the DNA to be sequenced to a solid support, usually combined with amplification of the DNA to enhance the subsequent signal;(ii)generation of single-stranded DNA on the solid support;(iii)primer-dependent incorporation of complementary nucleotides using an engineered polymerase; and(iv)detection of the incorporated nucleotide. Steps (iii) and (iv) are repeated and the sequence is assembled from the signals obtained in step (iv). This principle of SBS has been used for almost all MPS efforts and it has contributed to the vast majority of sequence information generated during the past decade [10.Heather J.M. Chain B. The sequence of sequencers: the history of sequencing DNA.Genomics. 2016; 107: 1-8Crossref PubMed Scopus (634) Google Scholar]. The historical development of sequential SBS has been reviewed by others [10.Heather J.M. Chain B. The sequence of sequencers: the history of sequencing DNA.Genomics. 2016; 107: 1-8Crossref PubMed Scopus (634) Google Scholar], which we briefly summarize here. The concept was first described in 1993 [11.Nyren P. et al.Solid phase DNA minisequencing by an enzymatic luminometric inorganic pyrophosphate detection assay.Anal. Biochem. 1993; 208: 171-175Crossref PubMed Scopus (0) Google Scholar] in the form of a technique later known as pyrosequencing. In this case, nucleotide incorporation was detected by measuring pyrophosphate products of incorporation. In the first publication [11.Nyren P. et al.Solid phase DNA minisequencing by an enzymatic luminometric inorganic pyrophosphate detection assay.Anal. Biochem. 1993; 208: 171-175Crossref PubMed Scopus (0) Google Scholar], all of the key concepts of SBS were introduced, including the amplification of DNA to enhance the subsequent signal and attachment of the DNA to be sequenced to a solid support, the generation of single-stranded DNA on the solid support, the incorporation of nucleotides using an engineered polymerase, and light detection of the incorporated nucleotide. This paper also outlines a vision of MPS: 'Automated on-line methods with multiple samples in parallel can be envisioned'. In a follow-up article [12.Ronaghi M. et al.Real-time DNA sequencing using detection of pyrophosphate release.Anal. Biochem. 1996; 242: 84-89Crossref PubMed Scopus (848) Google Scholar], the concept was further developed, and a few years later Ronaghi, Uhlén and Nyrén [13.Ronaghi M. et al.A sequencing method based on real-time pyrophosphate.Science. 1998; 281: 365Crossref PubMed Scopus (0) Google Scholar] showed that non-incorporated nucleotides could be removed with a fourth enzyme (apyrase) allowing SBS to be performed without the need to wash away non-incorporated nucleotides. A commercial instrument based on SBS (called Pyrosequencing) was launched in 2000 with all key concepts for SBS with real-time detection and with a throughput of 96 samples in parallel [14.Harrington C.T. et al.Fundamentals of pyrosequencing.Arch. Pathol. Lab. Med. 2013; 137: 1296-1303Crossref PubMed Scopus (92) Google Scholar]. A modified version of this instrument is still available and is used for many applications, including DNA methylation/epigenetics [15.Claus R. et al.A systematic comparison of quantitative high-resolution DNA methylation analysis and methylation-specific PCR.Epigenetics. 2012; 7: 772-780Crossref PubMed Scopus (57) Google Scholar,16.Sadikovic B. et al.Clinical epigenomics: genome-wide DNA methylation analysis for the diagnosis of Mendelian disorders.Genet. Med. 2021; 23: 1065-1074Abstract Full Text Full Text PDF PubMed Scopus (0) Google Scholar] and forensics studies [17.Ghemrawi M. et al.Pyrosequencing: current forensic methodology and future applications – a review.Electrophoresis. 2023; 44: 298-312Crossref Scopus (1) Google Scholar]. The first next-generation sequencers (Figure 2) were based on pyrosequencing chemistry and were commercialized by Rothberg and coworkers [3.Margulies M. et al.Genome sequencing in microfabricated high-density picolitre reactors.Nature. 2005; 437: 376-380Crossref PubMed Scopus (6037) Google Scholar] from the company 454 Life Sciences in the USA. They showed that sequencing could be performed in a highly parallel manner, and in the paper they described the successful sequencing of the genome of Mycoplasma genitalium, which is also the first description of whole-genome sequencing using MPS. Many important applications were enabled by this pioneering instrument [18.Rothberg J.M. Leamon J.H. The development and impact of 454 sequencing.Nat. Biotechnol. 2008; 26: 1117-1124Crossref PubMed Scopus (366) Google Scholar], including the analysis of the Neanderthal genome [19.Green R.E. et al.Analysis of one million base pairs of Neanderthal DNA.Nature. 2006; 444: 330-336Crossref PubMed Scopus (517) Google Scholar] in collaboration with Paabo and coworkers at the Max Planck Institute in Leipzig, Germany. Others used this approach to study microbial diversity in the deep sea [20.Sogin M.L. et al.Microbial diversity in the deep sea and the underexplored "rare biosphere".Proc. Natl. Acad. Sci. U. S. A. 2006; 103: 12115-12120Crossref PubMed Scopus (2774) Google Scholar], to study the microbial genomics of archaea in soil [21.Leininger S. et al.Archaea predominate among ammonia-oxidizing prokaryotes in soils.Nature. 2006; 442: 806-809Crossref PubMed Scopus (1754) Google Scholar], to study the possible causes of honeybee hive collapse [22.Cox-Foster D.L. et al.A metagenomic survey of microbes in honey bee colony collapse disorder.Science. 2007; 318: 283-287Crossref PubMed Scopus (1369) Google Scholar], and to perform the first single-cell genome sequencing of an unculturable organism [23.Marcy Y. et al.Dissecting biological "dark matter" with single-cell genetic analysis of rare and uncultivated TM7 microbes from the human mouth.Proc. Natl. Acad. Sci. U. S. A. 2007; 104: 11889-11894Crossref PubMed Scopus (471) Google Scholar]. This approach was also used to determine one of the first individual human genome sequences in 2008 [24.Wheeler D.A. et al.The complete genome of an individual by massively parallel DNA sequencing.Nature. 2008; 452: 872-876Crossref PubMed Scopus (1425) Google Scholar]. A further improvement of MPS by synthesis was the development of reversible and fluorescently labeled terminators to address the issue of homopolymers, which were less adequately determined by the Pyrosequencing detection system. The use of fluorescently labeled nucleotides for sequencing dates to the 1980s using conventional electrophoretic sequencing. Reversible terminators were first described by Metzker and colleagues [25.Metzker M.L. et al.Termination of DNA synthesis by novel 3′-modified-deoxyribonucleoside 5′-triphosphates.Nucleic Acids Res. 1994; 22: 4259-4267Crossref Scopus (75) Google Scholar], showing that 3′-modified nucleotides could be used for base-specific termination and photolytic removal of the 3′-protecting group. A few years later, the two UK chemists Balasubramanian and Klenerman filed a patent [26.Balasubramanian, S. and Klenerman, D. Solexa Ltd. Arrayed biomolecules and their use in sequencing, WO 00/06770Google Scholar] describing 'Arrayed biomolecules and their use in sequencing' in which they proposed the use of fluorescently labeled nucleotides combined with reversible terminators to allow SBS. Around the same time, the USA-based Quake group was pursuing a similar strategy of fluorescence photobleaching sequencing, which they described in a grant proposal to the National Institutes of Health (NIH) (https://grantome.com/grant/NIH/R29-HG001642-04). Balasubramanian and Klenerman initially aimed to achieve single-molecule detection but later abandoned this strategy and the concept was combined with in vitro-amplified DNA on a solid support [27.Kawashima, E. et al. Method of nucleic acid amplification, WO 98/44151Google Scholar], whereas the Quake group began with the idea of using multiple copies of a DNA template on a surface but then became the first to demonstrate single-molecule SBS [28.Braslavsky I. et al.Sequence information can be obtained from single DNA molecules.Proc. Natl. Acad. Sci. U. S. A. 2003; 100: 3960-3964Crossref PubMed Scopus (375) Google Scholar]. An instrument called the 'Genome Analyzer' for MPS by synthesis was subsequently launched in 2006 by the company Solexa founded by Balasubramanian and Klenerman. The modified SBS strategy used reversible terminators and involved immobilizing the sequencing templates and primers on a solid support, primer extension by the incorporation of a nucleotide with a 3′-blocking group by a polymerase, detection of the color of the fluorophore carried by the extended base after washing away the unincorporated nucleotides to identify the incorporated nucleotide, and removal of the fluorescent tag and the 3′-blocking group and repeating the steps by incorporation of the next nucleotide. Solexa was acquired by Illumina in 2007, and although few peer-reviewed articles were published from the company, a paper was published in 2008 [29.Bentley D.R. et al.Accurate whole human genome sequencing using reversible terminator chemistry.Nature. 2008; 456: 53-59Crossref PubMed Scopus (2580) Google Scholar] reporting on the use of their instrument for human genome sequencing. Similarly, the single-molecule sequencing approach from the Quake group was commercialized by Helicos Biosciences, which published a paper demonstrating viral genome sequencing with their instrument [30.Harris T.D. et al.Single-molecule DNA sequencing of a viral genome.Science. 2008; 320: 106-109Crossref PubMed Scopus (529) Google Scholar], followed by a publication from the same group [31.Pushkarev D. et al.Single-molecule sequencing of an individual human genome.Nat. Biotechnol. 2009; 27: 847-850Crossref PubMed Scopus (358) Google Scholar] determining the sequence of an entire human genome. Table 1 shows a short summary of some of the key milestones in the development of the concept of NGS.Table 1Key publications and patents in the field of SBSYearDescriptionRefs1993The concept of SBS[11.Nyren P. et al.Solid phase DNA minisequencing by an enzymatic luminometric inorganic pyrophosphate detection assay.Anal. Biochem. 1993; 208: 171-175Crossref PubMed Scopus (0) Google Scholar]1994The use of reversible terminators[25.Metzker M.L. et al.Termination of DNA synthesis by novel 3′-modified-deoxyribonucleoside 5′-triphosphates.Nucleic Acids Res. 1994; 22: 4259-4267Crossref Scopus (75) Google Scholar]1998Solution-based Pyrosequencing[13.Ronaghi M. et al.A sequencing method based on real-time pyrophosphate.Science. 1998; 281: 365Crossref PubMed Scopus (0) Google Scholar]1998Array of molecules on surface (patent)[26.Balasubramanian, S. and Klenerman, D. Solexa Ltd. Arrayed biomolecules and their use in sequencing, WO 00/06770Google Scholar]1998Clonal amplification on surface (patent)[27.Kawashima, E. et al. Method of nucleic acid amplification, WO 98/44151Google Scholar]1999In situ localized amplification[36.Mitra R.D. Church G.M. In situ localized amplification and contact replication of many individual DNA molecules.Nucleic Acids Res. 1999; 27e34Crossref PubMed Scopus (164) Google Scholar]2003Single-molecule sequencing[28.Braslavsky I. et al.Sequence information can be obtained from single DNA molecules.Proc. Natl. Acad. Sci. U. S. A. 2003; 100: 3960-3964Crossref PubMed Scopus (375) Google Scholar]2003Fluorescent in situ sequencing[64.Mitra R.D. et al.Fluorescent in situ sequencing on polymerase colonies.Anal. Biochem. 2003; 320: 55-65Crossref PubMed Scopus (0) Google Scholar]2005The concept of MPS[3.Margulies M. et al.Genome sequencing in microfabricated high-density picolitre reactors.Nature. 2005; 437: 376-380Crossref PubMed Scopus (6037) Google Scholar]2006Analysis of Neanderthal genome using SBS[19.Green R.E. et al.Analysis of one million base pairs of Neanderthal DNA.Nature. 2006; 444: 330-336Crossref PubMed Scopus (517) Google Scholar]2008SBS using reversible terminators[29.Bentley D.R. et al.Accurate whole human genome sequencing using reversible terminator chemistry.Nature. 2008; 456: 53-59Crossref PubMed Scopus (2580) Google Scholar]2008Transcriptomics analysis (RNA-seq)[65.Mortazavi A. et al.Mapping and quantifying mammalian transcriptomes by RNA-seq.Nat. Methods. 2008; 5: 621-628Crossref PubMed Scopus (10109) Google Scholar]20101000 Genomes Project published[66.1000 Genomes Project Consortium et al.A map of human genome variation from population-scale sequencing.Nature. 2010; 467: 1061-1073Crossref PubMed Scopus (6048) Google Scholar]2014Single-cell genomics (transcriptomics)[62.Shalek A.K. et al.Single-cell RNA-seq reveals dynamic paracrine control of cellular variation.Nature. 2014; 510: 363-369Crossref PubMed Scopus (645) Google Scholar]2016Population-based genome sequencing (Iceland population)[4.Gudbjartsson D.F. et al.Sequence variants from whole genome sequencing a large group of Icelanders.Sci. Data. 2015; 2150011Crossref Scopus (49) Google Scholar]2016Spatial transcriptomics[63.Stahl P.L. et al.Visualization and analysis of gene expression in tissue sections by spatial transcriptomics.Science. 2016; 353: 78-82Crossref PubMed Scopus (1149) Google Scholar]2017The Human Cell Atlas project[67.Rozenblatt-Rosen O. et al.The Human Cell Atlas: from vision to reality.Nature. 2017; 550: 451-453Crossref PubMed Scopus (322) Google Scholar]2019The 25 000 cancer genomes project[43.Zhang J. et al.The International Cancer Genome Consortium Data Portal.Nat. Biotechnol. 2019; 37: 367-369Crossref PubMed Scopus (202) Google Scholar]2021Next-generation blood profiling using proximity extension assay[50.Zhong W. et al.Next generation plasma proteome profiling to monitor health and disease.Nat. Commun. 2021; 12: 2493Crossref PubMed Scopus (0) Google Scholar]2021The 100 000 Genomes Project for rare diseases[44.100,000 Genomes Project Pilot Investigators et al.100,000 Genomes Pilot on rare-disease diagnosis in health care – preliminary report.N. Engl. J. Med. 2021; 385: 1868-1880Crossref PubMed Scopus (0) Google Scholar]2023Zoonomia: mammalian species genomes[45.Kaplow I.M. et al.Relating enhancer genetic variation across mammals to complex phenotypes using machine learning.Science. 2023; 380eabm7993Crossref Google Scholar] Open table in a new tab For methods that do not use single-molecule sequencing, it became crucial to perform controlled amplification of individual templates. Several PCR-based alternatives for amplification of DNA have been successfully explored.(i)Solid-phase sequencing. This strategy depended on PCR-amplified DNA fragments bound to solid support using the biotin-streptavidin system. This technology had already been described in the 1980s [32.Stahl S. et al.Solid phase DNA sequencing using the biotin-avidin system.Nucleic Acids Res. 1988; 16: 3025-3038Crossref PubMed Scopus (0) Google Scholar,33.Hultman T. et al.Direct solid phase sequencing of genomic and plasmid DNA using magnetic beads as solid support.Nucleic Acids Res. 1989; 17: 4937-4946Crossref PubMed Google Scholar] and the approach was later used to develop Pyrosequencing, which was launched commercially in 2000. A variant of this concept was described by the Syvanen group [34.Pastinen T. et al.Minisequencing: a specific tool for DNA analysis and diagnostics on oligonucleotide arrays.Genome Res. 1997; 7: 606-614Crossref PubMed Scopus (294) Google Scholar] in which a oligonucleotide array was used for mini-sequencing to allow multiplex detection of mutations.(ii)Emulsion PCR. This strategy depended on amplifying the template in microdroplets created by an emulsion technique. This concept was developed by the company 454 and was used to develop the 454/Pyrosequencing platform published [3.Margulies M. et al.Genome sequencing in microfabricated high-density picolitre reactors.Nature. 2005; 437: 376-380Crossref PubMed Scopus (6037) Google Scholar] and launched in 2005.(iii)Bridge-PCR. This strategy, also called in vitro molecular cloning, enables in situ localized amplification of DNA molecules using a solid support with synthesized primers. This was introduced in a patent application by Kawashima and coworkers [27.Kawashima, E. et al. Method of nucleic acid amplification, WO 98/44151Google Scholar] from the company Serano and the concept was further developed by the Church group to allow amplification of a single DNA molecule to form a 'polony' [35.Shendure J. et al.Accurate multiplex polony sequencing of an evolved bacterial genome.Science. 2005; 309: 1728-1732Crossref PubMed Scopus (1037) Google Scholar,36.Mitra R.D. Church G.M. In situ localized amplification and contact replication of many individual DNA molecules.Nucleic Acids Res. 1999; 27e34Crossref PubMed Scopus (164) Google Scholar]. The bridge-PCR technology was later used for the launch of the Genome Analyzer by Solexa in 2006. For most of the SBS platforms, the availability of an amplified DNA template on a solid support was one of the key concepts that enabled the introduction of MPS. After the introduction of the first MPS system in 2005 [3.Margulies M. et al.Genome sequencing in microfabricated high-density picolitre reactors.Nature. 2005; 437: 376-380Crossref PubMed Scopus (6037) Google Scholar], many alternative systems were developed, including the Solexa/Illumina system based on reversible terminators [29.Bentley D.R. et al.Accurate whole human genome sequencing using reversible terminator chemistry.Nature. 2008; 456: 53-59Crossref PubMed Scopus (2580) Google Scholar], Helicos for single-molecule sequencing using reversible fluorescently labeled terminators [28.Braslavsky I. et al.Sequence information can be obtained from single DNA molecules.Proc. Natl. Acad. Sci. U. S. A. 2003; 100: 3960-3964Crossref PubMed Scopus (375) Google Scholar], and PacBio for sequencing long reads using single DNA molecule read-out [37.Rhoads A. Au K.F. PacBio sequencing and its applications.Genom. Proteomics Bioinform. 2015; 13: 278-289Crossref PubMed Scopus (0) Google Scholar]. An approach with an alternative to SBS, called nanopore sequencing, was also developed [38.Branton D. et al.The potential and challenges of nanopore sequencing.Nat. Biotechnol. 2008; 26: 1146-1153Crossref PubMed Scopus (1995) Google Scholar]. Later, the reversible terminator strategy was combined with 'nanoballs' [39.Porreca G.J. Genome sequencing on nanoballs.Nat. Biotechnol. 2010; 28: 43-44Crossref PubMed Scopus (0) Google Scholar], also based on SBS. Table 2 shows some of the instruments for SBS, and it summarizes the amplification strategy used and the choice of reagents (nucleotides) in the assay. All of these use the concept of SBS.Table 2Examples of instruments using sequential SBSYearInstrument/systemAmplificationNucleotidesComment2000PSQ96 (Pyrosequencing)Magnetic beadsNaturalSBS concept2005454 InstrumentMicrodropletsNaturalMPS concept2006Genome Analyzer (Solexa)Bridge-PCRReversibleImprove homopolymers2007tSMS (Helicos)Single moleculeReversibleSingle molecule2010Hiseq 2000 (Illumina)Bridge-PCRReversible200 Gb per run2010PacBio RSCircular DNAFluorescentExtreme long reads2010Ion Torrent instrumentMicrodropletsNaturalSemiconductor detector2016GenCode (10X Genomics)Single moleculeReversibleSingle-cell analysis2019DNBSeq-T7 (MGI)NanoballsReversible6000 Gb per run2022NovaSeq X (Illumina)Bridge-PCRReversible6000 Gb per run Open table in a new tab MPS can also be performed without SBS, but with the same objective to allow sequence analysis to be performed in a scalable and parallel manner. Examples of this include 'sequencing-by-ligation' [40.Pfeifer G.P. Riggs A.D. Genomic sequencing by ligation-mediated PCR.Mol. Biotechnol. 1996; 5: 281-288Crossref PubMed Google Scholar] and 'sequencing by hybridization' [41.Drmanac S. et al.Accurate sequencing by hybridization for DNA diagnostics and individual genomics.Nat. Biotechnol. 1998; 16: 54-58Crossref PubMed Scopus (0) Google Scholar], as well as the nanopore system, which relies on the monitoring of nucleotides passing a protein nanopore. However, the concept of SBS has become the leading analytical platform for NGS, with the Illumina instruments completely dominating the genomics field during the past 10 years. The impact of MPS technology on science has been overwhelming and the applications can roughly be divided into three separate research fields. MPS systems have allowed genomes to be assembled in an efficient manner. The approach usually relies on the concept of 'shotgun' sequencing, introduced in 1994 by the Myers and Venter groups [42.Adams M.D. et al.A model for high-throughput automated DNA sequencing and analysis core facilities.Nature. 1994; 368: 474-475Crossref PubMed Scopus (0) Google Scholar] in the context of Sanger sequencing. In this approach, both random sequence fragments and paired ends from larger molecules are sequenced and assembled by bioinformatics to yield the complete whole genome. This approach, and use of the SBS concept, has led to an explosion of sequenced genomes ranging from bacteria, fungi, and plants to mammalian species, including the sequencing of whole populations of humans [4.Gudbjartsson D.F. et al.Sequence variants from whole genome sequencing a large group of Icelanders.Sci. Data. 2015; 2150011Crossref Scopus (49) Google Scholar]. The strategy was used by the International Cancer Genome Consortium (ICGC) project to sequence 25 000 cancer genomes [43.Zhang J. et al.The International Cancer Genome Consortium Data Portal.Nat. Biotechnol. 2019; 37: 367-369Crossref PubMed Scopus (202) Google Scholar] and in the 100 000 Genomes Project to analyze the whole genomes of people affected by rare disease [44.100,000 Genomes Project Pilot Investigators et al.100,000 Genomes Pilot on rare-disease diagnosis in health care – preliminary report.N. Engl. J. Med. 2021; 385: 1868-1880Crossref PubMed Scopus (0) Google Scholar]. Recently, the analytical platform has also been used to explore the genetic variability across a large number of mammalian species and their relationship with complex phenotypes [45.Kaplow I.M. et al.Relating enhancer genetic variation across mammals to complex phenotypes using machine learning.Science. 2023; 380eabm7993Crossref Google Scholar]. Cost-effective technology to sequence short stretches of DNA has enabled a large number of new applications with huge impacts in life science. In these applications, MPS is used to analyze sequences that are subsequently compared with reference sequences, and thus the origin of the sequences can be inferred. By counting the presence of a particular sequence, it is possible to quantitatively estimate biological phenomena. An example of this application is transcriptomics [RNA-seq (see Glossary)] [46.Watson A. et al.Technology for microarray analysis of gene expression.Curr. Opin. Biotechnol. 1998; 9: 609-614Crossref PubMed Scopus (0) Google Scholar], in which the RNA profile in organs, tissues, and cells is estimated. Another example is microbiome analysis [47.Falony G. et al.Population-level analysis of gut microbiome variation.Science. 2016; 352: 560-564Crossref PubMed Scopus (1329) Google Scholar], in which microbial populations in various environmental niches, including the human gut, are studied. In addition, there is a long tail of various applications, including epigenetic studies involving methylation analysis, studies of transcription factor binding sites (ChIP-seq) and assay for transposase-accessible chromatin using sequencing (ATTACK-seq) to determine chromatin accessibility across the genome [48.Yan F. et al.From reads to insight: a hitchhiker's guide to ATAC-seq data analysis.Genome Biol. 2020; 21: 22Crossref PubMed Scopus (152) Google Scholar]. An interesting application of SBS is the use of 'tag' library molecules [49.Schuster S.C. Next-generation sequencing transforms today's biology.Nat. Methods. 2008; 5: 16-18Crossref PubMed Scopus (1286) Google Scholar] based on synthetic oligonucleotide sequences ('barcodes'). These applications take advantage of MPS as a way to quantify the number of tags in a given sample. This technology is now used in a number of new applications, including 'next-generation blood profiling' [50.Zhong W. et al.Next generation plasma proteome profiling to monitor health and disease.Nat. Commun. 2021; 12: 2493Crossref PubMed Scopus (0) Google Scholar,51.Darmanis S. et al.ProteinSeq: high-performance proteomic analyses by proximity ligation and next generation sequencing.PLoS One. 2011; 6e25583Crossref PubMed Scopus (62) Google Scholar] allowing thousands of proteins to be simultaneously detected from a small drop of blood. A common strategy is to combine barcode counting with reference-based sequencing. One example of this is single-cell genomics [52.Sandberg R. Entering the era of single-cell transcriptomics in biology and medicine.Nat. Methods. 2014; 11: 22-24Crossref PubMed Scopus (173) Google Scholar], in which reference-based sequencing is used for transcript counting and the barcode is used to identify the cell origin of a sample. Another example is spatial transcriptomics [53.Chen A. et al.Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays.Cell. 2022; 185: 1777-1792.e21Abstract Full Text Full Text PDF PubMed Scopus (126) Google Scholar], where reference-based sequencing is again used for transcript counting, while the barcode is used to localize RNA molecules on the tissue sample. In summary, the impact of sequential SBS in the life science field is enormous. The concept has allowed the rapid development of NGS platforms and thus transformed the field of life science, contributing to a dramatic expansion in our understanding of human health and disease as well as our understanding of biology and ecology. The vast amount of data has led to a new era of data-driven life science in which machine learning and other AI-based methods are and will be of increased importance to expand our life science knowledge base. The 25 000 cancer genomes project [43.Zhang J. et al.The International Cancer Genome Consortium Data Portal.Nat. Biotechnol. 2019; 37: 367-369Crossref PubMed Scopus (202) Google Scholar] and the 100 000 Genomes Project [44.100,000 Genomes Project Pilot Investigators et al.100,000 Genomes Pilot on rare-disease diagnosis in health care – preliminary report.N. Engl. J. Med. 2021; 385: 1868-1880Crossref PubMed Scopus (0) Google Scholar] are just two recent examples among many of how rapidly the field is moving forward. The trend of these 'big science' projects generating open-access data makes it possible for researchers around the world to integrate omics data and analyze results based on both externally and internally generated data. The exponential growth of sequencing data will most likely continue for many years, driven by even lower costs of whole-genome sequencing (see Outstanding questions). It is possible to envision whole-genome sequencing of all individuals at birth to facilitate later treatment choices by physicians based on personalized medicine strategies; however, such scenarios depend on ethical considerations and the safety of the information on the individual, and thus common rules and regulations to safeguard genome data must be in place [54.McGuire A.L. et al.Research ethics and the challenge of whole-genome sequencing.Nat. Rev. Genet. 2008; 9: 152-156Crossref PubMed Scopus (185) Google Scholar]. It is also not unlikely that a large proportion of cancer patients will have their tumor genome sequenced to adapt treatment to the genetic make-up of the respective tumor [55.Rosenquist R. et al.Clinical utility of whole-genome sequencing in precision oncology.Semin. Cancer Biol. 2022; 84: 32-39Crossref PubMed Scopus (18) Google Scholar]. Another interesting trend is the integration of SBS with various complementary technologies, sometimes referred to as multiomics [56.Hasin Y. et al.Multi-omics approaches to disease.Genome Biol. 2017; 18: 83Crossref PubMed Scopus (1037) Google Scholar]. Advances in proteomics [57.Cox J. Mann M. Is proteomics the new genomics?.Cell. 2007; 130: 395-398Abstract Full Text Full Text PDF PubMed Scopus (351) Google Scholar], metabolomics [58.Liu X. Locasale J.W. Metabolomics: a primer.Trends Biochem. Sci. 2017; 42: 274-284Abstract Full Text Full Text PDF PubMed Scopus (209) Google Scholar], and bioimaging [59.Ellenberg J. et al.A call for public archives for biological image data.Nat. Methods. 2018; 15: 849-854Crossref PubMed Scopus (64) Google Scholar] combined with NGS data make it possible to study biology and medicine in ways impossible only a few years ago. A recent example is the development of next-generation blood protein profiling, in which proximity extension assays [50.Zhong W. et al.Next generation plasma proteome profiling to monitor health and disease.Nat. Commun. 2021; 12: 2493Crossref PubMed Scopus (0) Google Scholar,51.Darmanis S. et al.ProteinSeq: high-performance proteomic analyses by proximity ligation and next generation sequencing.PLoS One. 2011; 6e25583Crossref PubMed Scopus (62) Google Scholar] and SomaScan assays [60.Gold L. et al.Advances in human proteomics at high scale with the SOMAscan proteomics platform.New Biotechnol. 2012; 29: 543-549Crossref PubMed Scopus (145) Google Scholar,61.Candia J. et al.Assessment of variability in the plasma 7k SomaScan proteomics assay.Sci. Rep. 2022; 12: 17147Crossref Scopus (11) Google Scholar] now allow thousands of proteins to be analyzed in a quantitative manner starting from only a small drop of blood. A recent trend is the increased technical improvements in single-cell genomics [62.Shalek A.K. et al.Single-cell RNA-seq reveals dynamic paracrine control of cellular variation.Nature. 2014; 510: 363-369Crossref PubMed Scopus (645) Google Scholar] and spatial transcriptomics [53.Chen A. et al.Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays.Cell. 2022; 185: 1777-1792.e21Abstract Full Text Full Text PDF PubMed Scopus (126) Google Scholar,63.Stahl P.L. et al.Visualization and analysis of gene expression in tissue sections by spatial transcriptomics.Science. 2016; 353: 78-82Crossref PubMed Scopus (1149) Google Scholar] to allow more depth and breadth in the investigation of the transcriptional landscape in cells, tissues, and organs. The technical improvements will most likely continue and thus advance our holistic understanding of the functional building blocks of life. In summary, SBS technology has led to a revolution in the field of DNA and RNA sequencing and in doing so transformed life science research and greatly contributed to an expansion of our knowledge of biology and medicine.Outstanding questionsWill the trend of lower costs for DNA sequencing continue and will this result in whole-genome sequencing of whole populations?Will genome sequencing facilitate treatment choices by physicians based on personalized medicine strategies?Is it possible to overcome the ethical concerns with whole-genome sequencing regarding the safety and security of the individual genome information? Will the trend of lower costs for DNA sequencing continue and will this result in whole-genome sequencing of whole populations? Will genome sequencing facilitate treatment choices by physicians based on personalized medicine strategies? Is it possible to overcome the ethical concerns with whole-genome sequencing regarding the safety and security of the individual genome information? We are grateful to Per-Åke Nygren for critical reading and comments. The funding from the Knut and Alice Wallenberg Foundation is acknowledged. M.U. is the cofounder of the company Pyrosequencing (Sweden) and S.R.Q. is the cofounder of Helicos Biosciences (USA). 1 billion bases (nucleotides) of DNA sequence data. sequencing of RNA using NGS.
Referência(s)