Practical and Flexible Pattern Matching over Ziv-Lempel Compressed Text
Gonzalo Navarro and Mathieu Raffinot
We address the problem of string matching on Ziv-Lempel
compressed text. The goal is to search a pattern in a text without
uncompressing it. This is a highly relevant issue
to keep compressed text databases where efficient searching is still
possible.
We develop a general technique for string matching when the text comes
as a sequence of blocks. This abstracts the essential features of Ziv-Lempel
compression. We then apply the scheme to each particular type of compression.
We present an algorithm to find all the matches of a pattern
in a text compressed using LZ77.
When we apply our scheme to LZ78, we obtain a much more efficient search
algorithm, which is faster than uncompressing the text and then searching on
it.
Finally, we propose a new hybrid compression scheme which is between LZ77
and LZ78, being in practice as good to compress as LZ77 and as fast to search
in as LZ78.
We show also how to search some extended patterns on Ziv-Lempel compressed
text, such as classes of characters and approximate string matching.