Indexing Text using the Ziv-Lempel Trie.
Gonzalo Navarro
Let a text of u characters over an alphabet of size s be
compressible to n symbols by the LZ78 or LZW algorithm. We show how to
build a data structure, called the LZ-index, based on the Ziv-Lempel trie that
takes 4n log_2(n) (1+o(1)) bits of space (that is, 4 times the entropy of
the text) and reports the R occurrences of a pattern of length m
in worst case time O(m^3 log(s) + (m+R)log(n)).
We present a practical implementation of the LZ-index, which is faster than
current alternatives when we take into consideration the time to report the
positions or text contexts of the occurrences found.