Space-efficient Construction of LZ-index
Diego Arroyuelo and Gonzalo Navarro
A compressed full-text self-index is a data structure that replaces
a text and in addition gives indexed access to it, while taking space
proportional to the compressed text size. The LZ-index, in particular,
requires 4uH_k(1+o(1)) bits of space, where u is the text length in
characters and H_k is its k-th order empirical entropy. Although in
practice the LZ-index needs 1.0-1.5 times the text size, its construction
requires much more main memory (around 5 times the text size), which limits
its applicability to large texts. In this paper we present a practical
space-efficient algorithm to construct LZ-index, requiring
(4+e)uH_k+o(u) bits of space, for any constant 0, and
O(su) time, being $s$ the alphabet size. Our
experimental results show that our method is efficient in practice, needing
an amount of memory close to that of the final index.