Faster Bit-parallel Approximate String Matching
Heikki Hyyrö and Gonzalo Navarro
We present a new bit-parallel technique for approximate string matching.
We build on two previous techniques. The first one [Myers, J. of the ACM, 1999],
searches for a pattern of length m in a text of length n permitting k
differences in O(mn/w) time, where w is the width of the computer word.
The second one [Navarro and Raffinot, ACM JEA, 2000], extends a sublinear-time
exact algorithm to approximate searching. The latter technique makes use of an
O(kmn/w) time algorithm [Wu and Manber, Comm. ACM, 1992] for its internal
workings. This algorithm is slow but flexible enough to support all the required
operations. In this paper we show that the faster algorithm of Myers can be
adapted to support all those operations. This involves extending it to compute
edit distance, to search for any pattern suffix, and to detect in advance the
impossibility of a later match. The result is an algorithm that performs
better than the original version of Navarro and Raffinot and that is the
fastest for several combinations of m, k and alphabet sizes that are useful,
for example, in natural language searching and computational biology.