Alphabet Partitioning for Compressed Rank/Select and Applications

Jérémy Barbay, Travis Gagie, Gonzalo Navarro, and Yakov Nekrich

We present a data structure that stores a string S[1..n] over the alphabet [1..s] in nH_0(S) + o(n)(H_0(S)+1) bits, where H_0(S) is the zero-order entropy of S. This data structure supports the operators access and rank in time O(log log s), and the select operator in constant time. This result improves on previously known data structures using nH_0(S) + o(n log s) bits, where on highly compressible instances the redundancy o(n log s) can cease to be negligible compared to the nH_0(S) bits that encode the data. The technique is based on combining previous results through an ingenious partitioning of the alphabet, and practical enough to be implementable. It applies not only to strings, but also to several other compact data structures. For example, we achieve (i) the first representation of a string S using nH_k(S) + o(n) log s bits and supporting access, rank, and select in poly-loglog time; (ii) faster search times for the smallest existing full-text self-index; (iii) compressed permutations p with times for p() and p^(-1)() improved to log-logarithmic; and (iv) the first compressed representation of dynamic collections of disjoint sets.