Alphabet Partitioning for Compressed Rank/Select and Applications
Jérémy Barbay, Travis Gagie, Gonzalo Navarro, and Yakov Nekrich
We present a data structure that stores a string S[1..n] over the
alphabet [1..s] in nH_0(S) + o(n)(H_0(S)+1) bits, where
H_0(S) is the zero-order entropy of S.
This data structure supports the operators access and rank in
time O(log log s), and the select operator in constant
time.
This result improves on previously known data structures using
nH_0(S) + o(n log s) bits, where on highly compressible instances
the redundancy o(n log s) can cease to be negligible compared to
the nH_0(S) bits that encode the data.
The technique is based on combining previous results through an ingenious
partitioning of the alphabet, and practical enough to be implementable. It
applies not only to strings, but also to several other compact data
structures.
For example, we achieve (i) the first representation of a string S
using nH_k(S) + o(n) log s bits and supporting access, rank, and
select in poly-loglog time; (ii) faster search times for the smallest
existing full-text self-index; (iii) compressed permutations p with
times for p() and p^(-1)() improved to log-logarithmic;
and (iv)
the first compressed representation of dynamic collections of disjoint
sets.