###
New Techniques for Regular Expression Searching

####
Gonzalo Navarro and Mathieu Raffinot

We present two new techniques for regular expression searching and use them to
derive faster practical algorithms.
Based on the specific properties of Glushkov's nondeterministic finite automaton
construction algorithm, we show how to encode a deterministic finite
automaton (DFA) using *O(m*2^m)* bits, where *m* is the number of
characters, excluding operator symbols, in the regular expression. This
compares favorably against the worst case of *O(m*2^m*|S|)$ bits needed by
a classical DFA representation (where **S* is the alphabet) and
*O(m*2^(2m))* bits needed by the Wu and Manber approach implemented in
*Agrep*.

*
We also present a new way to search for regular expressions, which is
able to skip text characters. The idea is to determine the minimum
length **l* of a string matching the regular expression,
manipulate the original automaton so that it recognizes all the
reverse prefixes of length up to *l* of the strings originally accepted,
and use it to skip text characters as done for exact string matching
in previous work.

*
We combine these techniques into two algorithms, one able and one unable to
skip text characters. The algorithms are simple to implement, and our
experiments show that they permit fast searching for regular expressions,
normally faster than any existing algorithm.
*