Regular Expression Searching on Compressed Text

Gonzalo Navarro

We present a solution to the problem of regular expression searching on compressed text. The format we choose is the Ziv-Lempel family, specifically the LZ78 and LZW variants. Given a text of length u compressed into length n and a pattern of length m we report all the R occurrences of the pattern in the text in O(2^m + mn + Rm log m) worst case time. On average this drops to O(m^2 + (n+Rm) log m) or O(m^2+n+Ru/n) for most regular expressions. This is the first nontrivial result for this problem. The experimental results show that our compressed search algorithm needs half the time necessary for decompression plus searching, which is currently the only alternative.