Reducing the Space Requirement of LZ-index
Diego Arroyuelo, Gonzalo Navarro, and Kunihiko Sadakane
The LZ-index is a compressed full-text self-index able to represent a
text T[1..u], over an alphabet of size s and with k-th
order empirical entropy Hk(T), using 4uHk(T) + o(u log s) bits
for any k = o(log_s u). It can report all the occ occurrences
of a pattern P[1..m] in T in O(m^3 log s + (m+occ) log u)
worst case time. This is the only existing data structure of size
O(uHk(T)) able of spending O(log u) time per occurrence reported.
Its main drawback is the factor 4 in its space complexity, which makes
it larger than other state-of-the-art alternatives. In this paper we present
two different approaches to reduce the space requirement of LZ-index. In both
cases we achieve (2+e) u H_k(T) + o(u log s) bits of space, for any
constant e > 0, and we simultaneously improve the search time to
O(m^2 log m + (m+occ) log u). Both indexes support displaying any
subtext of length l in optimal O(l / log_s u) time. In addition,
we show how the space can be squeezed to (1+e) u H_k(T) + o(u log s) to
obtain a structure with O(m^2) average search time for m >=
2 log_s u.