A Bit-parallel approach to Suffix Automata: Fast Extended String Matching.
Gonzalo Navarro and Mathieu Raffinot
We present a new algorithm for string matching. The algorithm, called BNDM,
is the bit-parallel simulation of a known (but recent) algorithm called BDM.
BDM skips characters using a "suffix automaton" which is made
deterministic in the preprocessing. BNDM, instead, simulates the
nondeterministic version using bit-parallelism.
This algorithm is 20%-25% faster than BDM, 2-3 times faster than other
bit-parallel algorithms, and 10%-40% faster than all the Boyer-Moore
family. This makes it the fastest algorithm in all cases except for
very short or very long
patterns (e.g. on English text it is the fastest between 5 and 110 characters).
Moreover, the algorithm is very simple, allowing to easily implement other
variants of BDM which are extremely complex in their original formulation.
We show that, as other bit-parallel algorithms, BNDM can be extended to handle
classes of characters in the pattern and in the text, multiple patterns and
to allow errors in the pattern or in the text, combining simplicity, efficiency
and flexibility.
We also generalize the suffix automaton definition to handle
classes of characters. To the best of our knowledge, this
extension has not been studied before.