###
Compact DFA Representation for Fast Regular Expression Search

####
Gonzalo Navarro and Mathieu Raffinot

We present a new technique to encode a deterministic finite automaton (DFA).
Based on the specific properties of Glushkov's nondeterministic finite automaton
(NFA) construction algorithm, we are able to encode the DFA using
*(m+1)(2^{m+1}+|Sigma|)* bits, where *m* is the number of characters (excluding
operator symbols) in the regular expression and *Sigma* is the alphabet. This
compares favorably against the worst case of *(m+1)2^{m+1}|Sigma|* bits
needed by a classical DFA representation and *m(2^{2m+1}+|Sigma|)* bits needed
by the Wu and Manber approach implemented in *Agrep*.
Our approach is practical and simple to implement, and it permits searching
regular expressions of moderate size (which include most cases of interest)
faster than with any previously existing algorithm, as we show experimentally.