Bit-parallel Witnesses and their Applications to Approximate String Matching
Heikki Hyyrö and Gonzalo Navarro
We present a new bit-parallel technique for approximate string matching. We
build on two previous techniques. The first one, BPM [Myers, J. of the ACM,
1999], searches for a pattern of length m in a text of length n permitting
k differences in O(ceil(m/w) n) time, where w is the width of the computer word.
The second one, ABNDM [Navarro and Raffinot, ACM JEA, 2000], extends a
sublinear-time exact algorithm to approximate searching. ABNDM relies on another
algorithm, BPA [Wu and Manber, Comm. ACM, 1992], which makes use of an
O(k ceil(m/w) n) time algorithm for its internal workings. BPA is slow but flexible
enough to support all operations required by ABNDM. We improve previous ABNDM
analyses, showing that it is average-optimal in number of inspected characters,
although the overall complexity is higher because of the O(k ceil(m/w)) work done
per inspected character. We then show that the faster BPM can be adapted to
support all the operations required by ABNDM. This involves extending it to
compute edit distance, to search for any pattern suffix, and to detect in
advance the impossibility of a later match. The solution to those challenges is
based on the concept of a witness, which permits sampling some dynamic
programming matrix values so as to bound, deduce, or compute others fast. The
resulting algorithm is average-optimal for m <= w, assuming the alphabet
size is constant. In practice, it performs better than the original ABNDM and
is the fastest algorithm for several combinations of m, k and alphabet
sizes that are useful, for example, in natural language searching and
computational biology. To show that the concept of witnesses can be used in
further scenarios, we also improve a recent bit-parallel algorithm based on
Myers [Fredriksson, SPIRE 2003]. The use of witnesses greatly improves the
running time of this algorithm too.