Artigo Acesso aberto Revisado por pares

Fundamental Bounds for Sequence Reconstruction From Nanopore Sequencers

2016; Institute of Electrical and Electronics Engineers; Volume: 2; Issue: 1 Linguagem: Inglês

10.1109/tmbmc.2016.2630056

ISSN

2372-2061

Autores

Abram Magner, Jarosław Duda, Wojciech Szpankowski, Ananth Grama,

Tópico(s)

Algorithms and Data Compression

Resumo

Nanopore sequencers are emerging as promising new platforms for high-throughput sequencing. As with other technologies, sequencer errors pose a major challenge for their effective use. In this paper, we present a novel information theoretic analysis of the impact of insertion-deletion (indel) errors in nanopore sequencers. In particular, we consider the following problems: 1) for given indel error characteristics and rate, what is the probability of accurate reconstruction as a function of sequence length and 2) using replicated extrusion (the process of passing a DNA strand through the nanopore), what is the number of replicas needed to accurately reconstruct the true sequence with high probability? Our results provide a number of important insights: 1) the probability of accurate reconstruction of a sequence from a single sample in the presence of indel errors tends quickly (i.e., exponentially) to zero as the length of the sequence increases and 2) replicated extrusion is an effective technique for accurate reconstruction. We show that for typical distributions of indel errors, the required number of replicas is a slow function (polylogarithmic) of sequence length – implying that through replicated extrusion, we can sequence large reads using nanopore sequencers. Moreover, we show that in certain cases, the required number of replicas can be related to information-theoretic parameters of the indel error distributions.

Referência(s)