Beyond sequencing: machine learning algorithms extract biology hidden in Nanopore signal data

Revisão Acesso aberto Revisado por pares

Beyond sequencing: machine learning algorithms extract biology hidden in Nanopore signal data

2021; Elsevier BV; Volume: 38; Issue: 3 Linguagem: Inglês

10.1016/j.tig.2021.09.001

ISSN

1362-4555

Autores

Yuk Kei Wan, Christopher Hendra, Ploy N. Pratanwanich, Jonathan Göke,

Tópico(s)

RNA and protein synthesis mechanisms

Resumo

Nanopore sequencing accuracy has increased to 98.3% as new-generation base callers replace early generation hidden Markov model basecalling algorithms with neural network algorithms.Machine learning methods can classify sequences in real-time, allowing targeted sequencing with nanopore's ReadUntil feature.Machine learning and statistical testing tools can detect DNA modifications by analyzing ion current signals from nanopore direct DNA sequencing.Nanopore direct RNA sequencing profiles RNAs with their modification retained, which influences the ion current signals emitted from the nanopore.Machine learning and statistical testing tools analyze ion current signals from direct RNA sequencing, enabling RNA modification detection, RNA secondary structure prediction, and poly(A) tail length estimation. Nanopore sequencing provides signal data corresponding to the nucleotide motifs sequenced. Through machine learning-based methods, these signals are translated into long-read sequences that overcome the read size limit of short-read sequencing. However, analyzing the raw nanopore signal data provides many more opportunities beyond just sequencing genomes and transcriptomes: algorithms that use machine learning approaches to extract biological information from these signals allow the detection of DNA and RNA modifications, the estimation of poly(A) tail length, and the prediction of RNA secondary structures. In this review, we discuss how developments in machine learning methodologies contributed to more accurate basecalling and lower error rates, and how these methods enable new biological discoveries. We argue that direct nanopore sequencing of DNA and RNA provides a new dimensionality for genomics experiments and highlight challenges and future directions for computational approaches to extract the additional information provided by nanopore signal data. Nanopore sequencing provides signal data corresponding to the nucleotide motifs sequenced. Through machine learning-based methods, these signals are translated into long-read sequences that overcome the read size limit of short-read sequencing. However, analyzing the raw nanopore signal data provides many more opportunities beyond just sequencing genomes and transcriptomes: algorithms that use machine learning approaches to extract biological information from these signals allow the detection of DNA and RNA modifications, the estimation of poly(A) tail length, and the prediction of RNA secondary structures. In this review, we discuss how developments in machine learning methodologies contributed to more accurate basecalling and lower error rates, and how these methods enable new biological discoveries. We argue that direct nanopore sequencing of DNA and RNA provides a new dimensionality for genomics experiments and highlight challenges and future directions for computational approaches to extract the additional information provided by nanopore signal data. High-throughput short-read sequencing has played a pivotal role in broadening our understanding of biology. Short-read sequencing technologies have advanced the understanding of genetic diversity [1.1000 Genomes Project Consortium et al.A global reference for human genetic variation.Nature. 2015; 526: 68-74Google Scholar,2.Wu D. et al.Large-scale whole-genome sequencing of three diverse Asian populations in Singapore.Cell. 2019; 179: 736-749.e15Google Scholar], provided insights into transcriptomes and cell profiles in healthy populations [3.GTEx Consortium The Genotype-Tissue Expression (GTEx) project.Nat. Genet. 2013; 45: 580-585Google Scholar,4.Regev A. et al.The Human Cell Atlas.eLife. 2017; 6e27041Google Scholar], and helped deciphering disease biology [5.Weinstein J.N. et al.The Cancer Genome Atlas Pan-Cancer analysis project.Nat. Genet. 2013; 45: 1113-1120Google Scholar, 6.PCAWG Transcriptome Core Group et al.Genomic basis for RNA alterations in cancer.Nature. 2020; 578: 129-136Google Scholar, 7.ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium Pan-cancer analysis of whole genomes.Nature. 2020; 578: 82-93Google Scholar, 8.Hoadley K.A. et al.Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer.Cell. 2018; 173: 291-304.e6Google Scholar]. On top of the nucleotide sequences are epigenetic modifications that influence gene expression [9.Allis C.D. Jenuwein T. The molecular hallmarks of epigenetic control.Nat. Rev. Genet. 2016; 17: 487-500Google Scholar] and epitranscriptomic (see Glossary) modifications that impact RNA processing, stability, and translation efficiency [10.Roundtree I.A. et al.Dynamic RNA modifications in gene expression regulation.Cell. 2017; 169: 1187-1200Google Scholar]. By coupling high-throughput sequencing with wet lab techniques, approaches such as MeRIP (methylated RNA immunoprecipitation)-seq [11.Meyer K.D. et al.Comprehensive analysis of mRNA methylation reveals enrichment in 3′ UTRs and near stop codons.Cell. 2012; 149: 1635-1646Google Scholar], miCLIP (m6A individual-nucleotide-resolution cross-linking and immunoprecipitation)-seq [12.Linder B. et al.Single-nucleotide-resolution mapping of m6A and m6Am throughout the transcriptome.Nat. Methods. 2015; 12: 767-772Google Scholar], and bisulfite sequencing [13.Frommer M. et al.A genomic sequencing protocol that yields a positive display of 5-methylcytosine residues in individual DNA strands.Proc. Natl. Acad. Sci. U. S. A. 1992; 89: 1827-1831Google Scholar] allow the profiling of DNA and RNA modifications [14.Novoa E.M. et al.Charting the unknown epitranscriptome.Nat. Rev. Mol. Cell Biol. 2017; 18: 339-340Google Scholar]. Although short-read sequencing on DNA and RNA are easily scalable strategies, the profiling of epigenetic and epitranscriptomic modifications involves highly specialized protocols. Oxford Nanopore Technologies (ONT) provides a sequencing method (nanopore sequencing) that allows the profiling of genome and epigenome, or transcriptome and epitranscriptome with a single assay [15.Garalde D.R. et al.Highly parallel direct RNA sequencing on an array of nanopores.Nat. Methods. 2018; 15: 201-206Google Scholar,16.Rand A.C. et al.Mapping DNA methylation with high-throughput nanopore sequencing.Nat. Methods. 2017; 14: 411-413Google Scholar]. Nanopore sequencing generates long reads as each DNA or RNA molecule directly translocates through a nanopore. As the nucleic acids move through the nanopores in different nucleotide combinations, the changes in electrical current are measured (Figure 1A ). This measured signal not only enables the determination of sequence bases, but also the detection of DNA and RNA modifications, and the prediction of poly(A) tail length and RNA secondary structures through computational methodologies developed for these purposes. Because of the complex nature of the nanopore raw ion current signal, computational methods that use approaches from machine learning have been key to extracting the additional layers of information. In this review, we will provide an overview of computational approaches that facilitate the analysis of nanopore signal data with a GitHub page of the tools described i. We will highlight the different machine learning concepts that advanced basecalling, illustrate how they are applied for targeted sequencing, and introduce supervised and unsupervised approaches for identifying DNA and RNA modifications, RNA secondary structure prediction, and poly(A) tail length estimation. Finally, we provide an outlook into the future directions that should further enable the discovery of complex biological information from nanopore signal analysis with computational methods. Basecalling is the process that translates raw ion current signal data from nanopore sequencing to a sequence of bases (Figure 1B). The signal data correspond to the measured ion current changes from one nucleotide sequence of five (RNA) or six (DNA) bases (k-mer) to another during the translocation of a nucleic acid molecule through the pore. The noisy nature of the ion current signal makes determining the associating k-mers based solely on the signal data difficult as many of the k-mers share similar ranges of ion current signal values, which is especially true with the presence of homopolymers [17.Branton D. et al.The potential and challenges of nanopore sequencing.Nat. Biotechnol. 2008; 26: 1146-1153Google Scholar]. Early generation basecallers employ an error-prone and time-consuming segmentation process, which divides raw data series into k-mer-corresponding signal segments and translates these signal segments into k-mers [18.Teng H. et al.Chiron: translating nanopore raw signal directly into nucleotide sequence using deep learning.Gigascience. 2018; 7giy037Google Scholar]. These basecallers generate reads with an accuracy of 85% or lower ii. Since then, improvements in basecallers have been a major driver to increase nanopore sequencing accuracy, achieving over 98.3% of correctly identified basesiii. The first basecallers including ONT's cloud-based Metrichoriv and the open-source software Nanocall [19.David M. et al.Nanocall: an open source basecaller for Oxford Nanopore sequencing data.Bioinformatics. 2017; 33: 49-55Google Scholar], an offline alternative of Metrichor, utilize the hidden Markov model (HMM) for decoding the signal data. Assuming a nucleic acid moves through a pore one nucleotide at a time, these HMM-based basecallers treat the ion current signals as a chain of observable events while the k-mers as states within the HMM [20.Timp W. et al.DNA base-calling from a nanopore using a Viterbi algorithm.Biophys. J. 2012; 102: L37-L39Google Scholar]. As the first nucleotides of each state overlap with the last nucleotides of the previous state, joint probabilities of a sequence of nucleotides can be calculated, and the path with the maximum total joint probability represents the final predicted sequence [20.Timp W. et al.DNA base-calling from a nanopore using a Viterbi algorithm.Biophys. J. 2012; 102: L37-L39Google Scholar] (Figure 2A ). To improve the accuracy of the predicted sequence, the basecalling algorithm PoreSeq introduces artificial mutations to the sequence and replaces short regions of the original best sequences with the same regions of the mutated sequence having a higher probability [21.Szalay T. Golovchenko J.A. De novo sequencing and variant calling with nanopores using PoreSeq.Nat. Biotechnol. 2015; 33: 1087-1091Google Scholar]. As HMM basecallers predict sequences based on the short-range dependencies of one k-mer to its next, they may overlook the long-range dependencies in nanopore sequencing. Furthermore, using a nucleotide sequence model that inaccurately describes the expected current values of the k-mers can cause basecalling biases with HMM basecallers [22.Boža V. et al.DeepNano: deep recurrent neural networks for base calling in MinION nanopore reads.PLoS One. 2017; 12e0178751Google Scholar]. To overcome these constraints, ONT's Albacore (prior version 2.0.1)v and nanonetvi, and the open-source software DeepNano [22.Boža V. et al.DeepNano: deep recurrent neural networks for base calling in MinION nanopore reads.PLoS One. 2017; 12e0178751Google Scholar] and BasecRAWller [23.Stoiber M. Brown J. BasecRAWller: streaming nanopore basecalling directly from raw signal.bioRxiv. 2017; (Published online May 1, 2017)https://doi.org/10.1101/133058Google Scholar] use a recurrent neural network (RNN) framework for basecalling. A unidirectional RNN takes in information from the ion current input vector and the previous hidden state to calculate the current hidden state and the associated probability distribution of bases [22.Boža V. et al.DeepNano: deep recurrent neural networks for base calling in MinION nanopore reads.PLoS One. 2017; 12e0178751Google Scholar] (Figure 2B). Albacore, nanonet, and DeepNano use a bidirectional RNN, which incorporates information from previous and future states of the ion current input vector to improve prediction accuracy [22.Boža V. et al.DeepNano: deep recurrent neural networks for base calling in MinION nanopore reads.PLoS One. 2017; 12e0178751Google Scholar] (Figure 2C). Still, bidirectional RNNs are time consuming; therefore, BasecRAWller, with the aim to achieve real-time basecalling, uses two unidirectional RNNs to both segment and basecall the sequence in a streaming fashion, resulting in overall faster run time [23.Stoiber M. Brown J. BasecRAWller: streaming nanopore basecalling directly from raw signal.bioRxiv. 2017; (Published online May 1, 2017)https://doi.org/10.1101/133058Google Scholar]. These early basecallers depend on the segmentation process to define boundaries of segments based on a sharp change in signals (Figure 1). Segmentation can be error prone due to the varying translocation speed and the noisy signal [18.Teng H. et al.Chiron: translating nanopore raw signal directly into nucleotide sequence using deep learning.Gigascience. 2018; 7giy037Google Scholar]. To address this, segmentation-free basecallers have been developed, such as ONT's Albacore version 2.0.1v and the open-source software Chiron [18.Teng H. et al.Chiron: translating nanopore raw signal directly into nucleotide sequence using deep learning.Gigascience. 2018; 7giy037Google Scholar]. To eliminate the segmentation step, Chiron combines a convolutional neural network (CNN) for extracting signal features and an RNN for predicting nucleotide probability. Then, it implements a connectionist temporal classification (CTC) decoder to select the base with the highest probability at each position (Figure 2D) and does many-to-one mapping to finalize the complete sequence [18.Teng H. et al.Chiron: translating nanopore raw signal directly into nucleotide sequence using deep learning.Gigascience. 2018; 7giy037Google Scholar]. Although Chiron's segmentation-free approach outperforms the segmentation-dependent methods, the RNN framework's reliance on results from previous time points results in long running time. To speed up basecalling, Causalcall allows parallel processing by inputting segmented ion current measurements as a matrix into a temporal convolutional network, which models the ion current signal and calculates the nucleotide base occurrence probability at each time point. It uses a CTC decoder to output the base sequence with the highest probability for each fixed-size signal input and overlaps the base sequences to finalize the complete sequence [24.Zeng J. et al.Causalcall: nanopore basecalling using a temporal convolutional network.Front. Genet. 2019; 10: 1332Google Scholar]. The combination of a CNN and a CTC decoder is also used in ONT's research basecaller Bonitovii, which has achieved an unprecedentedly high basecalling accuracy of 98.3%, making the accuracy of nanopore sequencing comparable to that of next-generation sequencingiii. Nanopore sequencing has a unique ReadUntil feature that can eject reads in real time, and thereby free up the pore for sequencing specific reads of interest. To determine whether a read is a target in real time, ReadUntil requires rapid read classification based on as few nucleotides as possible from the reads. The ReadUntil feature can increase the sequencing depth for specific genomic regions, which enables targeted sequencing for applications such as sequencing-based diagnosis or novel microbial genome discovery from metagenomic samples [25.Payne A. et al.Readfish enables targeted nanopore sequencing of gigabase-sized genomes.Nat. Biotechnol. 2021; 39: 442-450Google Scholar,26.Kovaka S. et al.Targeted nanopore sequencing by real-time mapping of raw electrical signal with UNCALLED.Nat. Biotechnol. 2021; 9: 431-441Google Scholar]. Approaches that enable utilizing the ReadUntil feature includes Readfish [25.Payne A. et al.Readfish enables targeted nanopore sequencing of gigabase-sized genomes.Nat. Biotechnol. 2021; 39: 442-450Google Scholar], UNCALLED [26.Kovaka S. et al.Targeted nanopore sequencing by real-time mapping of raw electrical signal with UNCALLED.Nat. Biotechnol. 2021; 9: 431-441Google Scholar], and SquiggleNet [27.Bao Y. et al.Real-time, direct classification of nanopore signals with SquiggleNet.bioRxiv. 2021; (Published online January 20, 2021)https://doi.org/10.1101/2021.01.15.426907Google Scholar]. The Readfish pipeline translates raw signals to nucleotide sequences in real time with guppy, aligns sequences to the reference, and then decides whether to eject the reads from the pores [25.Payne A. et al.Readfish enables targeted nanopore sequencing of gigabase-sized genomes.Nat. Biotechnol. 2021; 39: 442-450Google Scholar]. Similar success in real-time mapping is seen with the UNCALLED algorithm [26.Kovaka S. et al.Targeted nanopore sequencing by real-time mapping of raw electrical signal with UNCALLED.Nat. Biotechnol. 2021; 9: 431-441Google Scholar]. UNCALLED first converts signals into events (k-mers) with an HMM and then searches through the reference genome for matches that are consistent with the event-matched k-mers. After clustering consistent reads and reference coordinates, UNCALLED filters out false positives and reports the best-supported location [26.Kovaka S. et al.Targeted nanopore sequencing by real-time mapping of raw electrical signal with UNCALLED.Nat. Biotechnol. 2021; 9: 431-441Google Scholar]. Using a neural-network framework, SquiggleNet uses a CNN and makes classification using a model that was learned on the reference training data [27.Bao Y. et al.Real-time, direct classification of nanopore signals with SquiggleNet.bioRxiv. 2021; (Published online January 20, 2021)https://doi.org/10.1101/2021.01.15.426907Google Scholar]. These approaches have allowed the ReadUntil feature to classify target sequences in real-time. The application of these methods can be used effectively for targeted sequencing of microbial genomes and human cancer genes [25.Payne A. et al.Readfish enables targeted nanopore sequencing of gigabase-sized genomes.Nat. Biotechnol. 2021; 39: 442-450Google Scholar,26.Kovaka S. et al.Targeted nanopore sequencing by real-time mapping of raw electrical signal with UNCALLED.Nat. Biotechnol. 2021; 9: 431-441Google Scholar], leading to an enrichment of the sequence of interest without the requirement for additional experiments. Along with the basecalled sequences, downstream analyses of direct DNA and RNA sequencing also require reference sequence-aligned raw signals as inputs (Figure 1C). Segmentation describes the process that performs this raw signal-to-reference sequence alignment. Two methods for performing segmentation are tombo's resquiggleviii (previously nanoraw [28.Stoiber M. et al.De novo identification of DNA modifications enabled by genome-guided nanopore signal processing.bioRxiv. 2017; (Published online December 15, 2016)https://doi.org/10.1101/094672Google Scholar]) and nanopolish's eventalign [29.Loman N.J. et al.A complete bacterial genome assembled de novo using only nanopore sequencing data.Nat. Methods. 2015; 12: 733-735Google Scholar]. Tombo's resquiggle first identifies event boundaries based on large shifts of signal level as these are associated with a change in the nucleotide that occupies the nanopore. Tombo then assigns these signal events to their corresponding reference sequences using a dynamic time warping algorithm. Nanopolish's eventalign assigns ion current signals to the reference sequence using an adaptive banded alignment that identifies the most likely sequence associated with the signal for each read. By aligning raw signals to a reference sequence, both tombo's resquiggle and nanopolish's eventalign allow the extraction of biological information from direct DNA and RNA sequencing with the nanopore signal data for downstream analyses. Analyzing reference genome-aligned signals from direct DNA sequencing enables the extraction of biological information such as DNA modifications and/or chromatin accessibility (Figure 1D). The most common DNA modifications include N4-cytosine (4mC), 5-methylcytosine (5mC), 5-hydroxymethylcytosine (5hmC), and N6-methyladenine (6mA) [28.Stoiber M. et al.De novo identification of DNA modifications enabled by genome-guided nanopore signal processing.bioRxiv. 2017; (Published online December 15, 2016)https://doi.org/10.1101/094672Google Scholar, 29.Loman N.J. et al.A complete bacterial genome assembled de novo using only nanopore sequencing data.Nat. Methods. 2015; 12: 733-735Google Scholar, 30.Liu Q. et al.NanoMod: a computational tool to detect DNA modifications using Nanopore long-read sequencing data.BMC Genomics. 2019; 20: 78Google Scholar, 31.Simpson J.T. et al.Detecting DNA cytosine methylation using nanopore sequencing.Nat. Methods. 2017; 14: 407-410Google Scholar, 32.Ni P. et al.DeepSignal: detecting DNA methylation state from nanopore sequencing reads using deep-learning.Bioinformatics. 2019; 35: 4586-4595Google Scholar, 33.Lee I. et al.Simultaneous profiling of chromatin accessibility and methylation on human cell lines with nanopore sequencing.Nat. Methods. 2020; 17: 1191-1199Google Scholar, 34.McIntyre A.B.R. et al.Single-molecule sequencing detection of N6-methyladenine in microbial reference materials.Nat. Commun. 2019; 10: 579Google Scholar, 35.Jin Z. Liu Y. DNA methylation in human diseases.Genes Diseases. 2018; 5: 1-8Google Scholar, 36.Flusberg B.A. et al.Direct detection of DNA methylation during single-molecule, real-time sequencing.Nat. Methods. 2010; 7: 461-465Google Scholar, 37.Shah K. et al.Adenine methylation in Drosophila is associated with the tissue-specific expression of developmental and regulatory genes.G3. 2019; 9: 1893-1900Google Scholar]x, which are known to regulate transcription and alter biological processes, with some of them also likely to have clinical relevance [35.Jin Z. Liu Y. DNA methylation in human diseases.Genes Diseases. 2018; 5: 1-8Google Scholar]. The detection of the GpC modification can also allow the profiling of chromatin accessibility [33.Lee I. et al.Simultaneous profiling of chromatin accessibility and methylation on human cell lines with nanopore sequencing.Nat. Methods. 2020; 17: 1191-1199Google Scholar]. Through analyzing direct DNA sequencing signals with supervised machine learning methods or statistical methods that do not rely on training data, computational tools can discover novel DNA modifications and infer chromatin accessibility in a high-throughput manner (Table 1).Table 1Overview of computational methods used for analyzing direct DNA and RNA sequencing data (a comprehensive list of nanopore analysis tools is available onlinei)ApplicationToolDataType/modification analysisRefsInfer chromatin accessibilitynanoNOMe (nanopolish extension)DNACpG, GpC methylation[33.Lee I. et al.Simultaneous profiling of chromatin accessibility and methylation on human cell lines with nanopore sequencing.Nat. Methods. 2020; 17: 1191-1199Google Scholar]Detect DNA modificationnanopolish call-methylationDNA5mC[31.Simpson J.T. et al.Detecting DNA cytosine methylation using nanopore sequencing.Nat. Methods. 2017; 14: 407-410Google Scholar]SignalAlignDNA5mC,5hmC,6mA[16.Rand A.C. et al.Mapping DNA methylation with high-throughput nanopore sequencing.Nat. Methods. 2017; 14: 411-413Google Scholar]mCallerDNA6mA[34.McIntyre A.B.R. et al.Single-molecule sequencing detection of N6-methyladenine in microbial reference materials.Nat. Commun. 2019; 10: 579Google Scholar]DeepSignalsDNA6mA[32.Ni P. et al.DeepSignal: detecting DNA methylation state from nanopore sequencing reads using deep-learning.Bioinformatics. 2019; 35: 4586-4595Google Scholar]NanoModDNADe novo DNA modification detection[30.Liu Q. et al.NanoMod: a computational tool to detect DNA modifications using Nanopore long-read sequencing data.BMC Genomics. 2019; 20: 78Google Scholar]Detect RNA modificationtombo detect_modificationsDNA/RNAAlternate base detection[28.Stoiber M. et al.De novo identification of DNA modifications enabled by genome-guided nanopore signal processing.bioRxiv. 2017; (Published online December 15, 2016)https://doi.org/10.1101/094672Google Scholar]MINESRNAm6A[42.Lorenz D.A. et al.Direct RNA sequencing enables mA detection in endogenous transcript isoforms at base-specific resolution.RNA. 2020; 26: 19-28Google Scholar]EpiNanoRNAm6A[43.Liu H. et al.Accurate detection of mA RNA modifications in native RNA sequences.Nat. Commun. 2019; 10: 4079Google Scholar]Nanom6ARNAm6A[44.Gao Y. et al.Quantitative profiling of N-methyladenosine at single-base resolution in stem-differentiating xylem of Populus trichocarpa using nanopore direct RNA sequencing.Genome Biol. 2021; 22: 22Google Scholar]m6anetRNAm6A[45.Hendra C. et al.Detection of m6A from direct RNA sequencing using a Multiple Instance Learning framework.bioRxiv. 2021; (Published online September 22, 2021. https://doi.org/10.1101/2021.09.20.461055)Google Scholar]nano-IDRNA5-EU[46.Maier K.C. et al.Native molecule sequencing by nano-ID reveals synthesis and stability of RNA isoforms.Genome Res. 2020; 30: 1332-1344Google Scholar]nanoRMSRNAΨ, Nm, and comparative RNA modification detection[47.Begik O. et al.Quantitative profiling of pseudouridylation dynamics in native RNAs with nanopore sequencing.Nat. Biotechnol. 2021; (Published online May 13, 2021)https://doi.org/10.1038/s41587-021-00915-6Google Scholar]YanocompRNAComparative RNA modification detection[48.Parker M.T. et al.Yanocomp: robust prediction of m6A modifications in individual nanopore direct RNA reads.bioRxiv. 2021; (Published online June 16, 2021)https://doi.org/10.1101/2021.06.15.448494Google Scholar]DiffErrRNAComparative RNA modification detectionixELIGOSRNAComparative RNA modification detection[49.Jenjaroenpun P. et al.Decoding the epitranscriptional landscape from native RNA sequences.Nucleic Acids Res. 2021; 49e7Google Scholar]nanoDocRNAComparative RNA modification detection[50.Ueda H. nanoDoc: RNA modification detection using nanopore raw reads with Deep One-Class Classification.bioRxiv. 2020; (Published online September 13, 2020)https://doi.org/10.1101/2020.09.13.295089Google Scholar]nanocomporeRNAComparative RNA modification detection[51.Leger A. et al.RNA modifications detection by comparative nanopore direct RNA sequencing.bioRxiv. 2019; (Published online November 15, 2019)https://doi.org/10.1101/843136Google Scholar]DRUMMERRNAComparative RNA modification detection[52.Price A.M. et al.Direct RNA sequencing reveals mA modifications on adenovirus RNA are necessary for efficient splicing.Nat. Commun. 2020; 11: 6016Google Scholar]xPoreRNADifferential RNA modification rate analysis[53.Pratanwanich P.N. et al.Identification of differential RNA modifications from nanopore direct RNA sequencing with xPore.Nat. Biotechnol. 2021; (Published online July 1, 2021)https://doi.org/10.1038/s41587-021-00949-wGoogle Scholar]Predict RNA 2° structurenanoSHAPERNARNA structure (Nm, 2′-O-acetyl)[54.Stephenson W. et al.Direct detection of RNA modifications and structure using single molecule nanopore sequencing.bioRxiv. 2020; (Published online June 01, 2020)https://doi.org/10.1101/2020.05.31.126763Google Scholar]PORE-cupineRNARNA structure (NAI-N3)[55.Aw J.G.A. et al.Determination of isoform-specific RNA structure with nanopore long reads.Nat. Biotechnol. 2021; 39: 336-346Google Scholar]Estimate poly(A) tail lengthnanopolish polyaRNAPolyA tails[56.Workman R.E. et al.Nanopore native RNA sequencing of a human poly(A) transcriptome.Nat. Methods. 2019; 16: 1297-1305Google Scholar]tailfindrRNAPolyA tails[56.Workman R.E. et al.Nanopore native RNA sequencing of a human poly(A) transcriptome.Nat. Methods. 2019; 16: 1297-1305Google Scholar,57.Krause M. et al.tailfindr: alignment-free poly(A) length measurement for Oxford Nanopore RNA and DNA sequencing.RNA. 2019; 25: 1229-1241Google Scholar] Open table in a new tab To detect specific DNA modifications, supervised learning-based modification detection tools can be trained using data sets of experimentally validated modification sites (labeled data). Labels for modified cytosines, including the GpC modifications used for inferring chromatin accessibility [33.Lee I. et al.Simultaneous profiling of chromatin accessibility and methylation on human cell lines with nanopore sequencing.Nat. Methods. 2020; 17: 1191-1199Google Scholar], can be obtained through bisulfite sequencing [16.Rand A.C. et al.Mapping DNA methylation with high-throughput nanopore sequencing.Nat. Methods. 2017; 14: 411-413Google Scholar,31.Simpson J.T. et al.Detecting DNA cytosine methylation using nanopore sequencing.Nat. Methods. 2017; 14: 407-410Google Scholar, 32.Ni P. et al.DeepSignal: detecting DNA methylation state from nanopore sequencing reads using deep-learning.Bioinformatics. 2019; 35: 4586-4595Google Scholar, 33.Lee I. et al.Simultaneous profiling of chromatin accessibility and methylation on human cell lines with nanopore sequencing.Nat. Methods. 2020; 17: 1191-1199Google Scholar] while m6A labels can be obtained from artificially methylated nucleotides with methyltransferases [16.Rand A.C. et al.Mapping DNA methylation with high-throughput nanopore sequencing.Nat. Methods. 2017; 14: 411-413Google Scholar] or orthogonal PacBio sequencing of naturally existing modifications [34.McIntyre A.B.R. et al.Single-molecule sequencing detection of N6-methyladenine in microbial reference materials.Nat. Commun. 2019; 10: 579Google Scholar,36.Flusberg B.A. et al.Direct detection of DNA methylation during single-molecule, real-time sequencing.Nat. Methods. 2010; 7: 461-465Google Scholar]. Supervised learning methods for DNA modification identification include nanopolish, signalAlign, mCaller, DeepSignals, and tombo's detect_modifications module's alternative model mode [28.Stoiber M. et al.De novo identification of DNA modifications enabled by genome-guided nanopore signal processing.bioRxiv. 2017; (Published online D

Ver no editor

Altmetric

PlumX

Entrar

Lembrar minha senha

Receber meu e-mail de confirmação

Beyond sequencing: machine learning algorithms extract biology hidden in Nanopore signal data