Increased Bit-Parallelism for Approximate String Matching.
Heikki Hyyrö, Kimmo Fredriksson and Gonzalo Navarro.
Bit-parallelism permits executing several operations simultaneously over a set
of bits or numbers stored in a single computer word. This technique permits
searching for the approximate occurrences of a pattern of length m in a
text of length n in time O(ceil(m/w) n), where w is the
number of bits in the computer word. Although this is asymptotically the
optimal speedup over the basic O(mn) time algorithm, it wastes
bit-parallelism's power in the common case where m is much smaller than
w, since w-m bits in the computer words get unused.
In this paper we explore different ways to increase the bit-parallelism when
the search pattern is short. First, we show how multiple patterns can be packed
in a single computer word so as to search for multiple patterns simultaneously.
Instead of paying O(rn) time to search for r patterns of length
m, we obtain O(ceil(r/floor(w/m)) n) time. Second, we show how
the mechanism permits boosting the search for a single pattern of length
m < w, which can be searched for in time O(n / floor(w/m))
instead of O(n). Finally, we show how to extend these algorithms so that
the time bounds essentially depend on k instead of m, where
k is the maximum number of differences permitted.
Our experimental results show that that the algorithms work well in
practice, and are the fastest alternatives for wide range of search
parameters.