Practical Rank/Select Queries over Arbitrary Sequences
Francisco Claude and Gonzalo Navarro
We present a practical study on the compact representation of sequences
supporting rank, select, and access queries. While there
are several theoretical solutions to the problem, only a few have been tried
out, and there is little idea on how the others would perform, especially in the
case of sequences with very large alphabets. We first present a compressed
representation for bit sequences that is competitive with the existing ones
when the sequences are not too compressible. It also has nice local
compression properties, and we show that this makes it an excellent tool for
compressed text indexing in combination with the Burrows-Wheeler transform.
This shows the practicality of a recent theoretical proposal [Mäkinen and
Navarro, SPIRE 2007], achieving spaces never seen before. Second, for general
sequences, we tune wavelet trees for
the case of very large alphabets, by removing their pointer information. We
show that this gives an excellent solution for representing a sequence within
zero-order entropy space, in cases where the large alphabet poses a serious
challenge to typical encoding methods. We also present the first
implementation of Golynski et al.'s representation [SODA 2006], which turns
out to be the most competitive in time, yet the most space demanding. We show
applications to natural language and graph compression.