Regular Expression Searching on Compressed Text
Gonzalo Navarro
We present a solution to the problem of regular expression searching on
compressed text. The format we choose is the Ziv-Lempel family, specifically
the LZ78 and LZW variants. Given a text of length u compressed into length
n and a pattern of length m we report all the R occurrences of the
pattern in the text in O(2^m + mn + Rm log m) worst case time. On average
this drops to O(m^2 + (n+Rm) log m) or O(m^2+n+Ru/n) for most regular
expressions.
This is the first nontrivial result for this problem. The experimental results
show that our compressed search algorithm needs half the time necessary for
decompression plus searching, which is currently the only alternative.