Capítulo de livro Revisado por pares

Pattern matching with mismatches: A probabilistic analysis and a randomized algorithm

1992; Springer Science+Business Media; Linguagem: Inglês

10.1007/3-540-56024-6_3

ISSN

1611-3349

Autores

Mikhail J. Atallah, Philippe Jacquet, Wojciech Szpankowski,

Tópico(s)

DNA and Biological Computing

Resumo

Given a text of length n and a pattern of length m over some (possibly unbounded) alphabet, we consider the problem of finding all positions in the text at which the pattern “almost occurs”. Here by “almost occurs” we mean that at least some fixed fraction ρ of the characters of the pattern (for example, ≥ 60% of them) are equal to their corresponding characters in the text. We design a randomized algorithm that has O(n log m) worst-case time complexity and computes with high probability all of the almost-occurrences of the pattern in the text. This algorithm assumes that the fraction ρ is given as part of its input, and it works well even for relatively small values of ρ. It makes no assumptions about the probabilistic characteristics of the input. Our second contribution deals with the issue of which values of ρ correspond to the intuitive notion of similarity between pattern and text, and this leads us to the development of a probabilistic analysis for the case where both input strings are random (in the usual, i.e., Bernoulli, model).

Referência(s)