A General Practical Approach to Pattern Matching
over Ziv-Lempel Compressed Text
Gonzalo Navarro and Mathieu Raffinot
We address the problem of string matching on Ziv-Lempel
compressed text. The goal is to search a pattern in a text without
uncompressing it. This is a highly relevant issue
to keep compressed text databases where efficient searching is still
possible.
We develop a general technique for string matching when the text comes
as a sequence of blocks. This abstracts the essential features of Ziv-Lempel
compression. We then apply the scheme to each particular type of compression.
We present the first algorithm to find all the matches of a pattern
in a text compressed using LZ77.
When we apply our scheme to LZ78, we obtain a much more efficient search
algorithm, which is faster than uncompressing the text and then searching on
it.
Finally, we propose a new hybrid compression scheme which is between LZ77
and LZ78, being in practice as good to compress as LZ77 and as fast to search
in as LZ78.