Rank and Select Revisited and Extended
Veli Mäkinen and Gonzalo Navarro
The intricate connection between the Burrows-Wheeler transform (BWT)
and the so-called rank and select dictionaries over
symbol sequences is the basis of most successful approaches to
compressed text indexing. Rank of a symbol at a given position equals
the number of times the symbol appears in the corresponding prefix of
the sequence. Select is the inverse, retrieving the positions of the
symbol occurrences. Experience has shown that improvements to
rank/select algorithms, in combination with the BWT, turn into improved
compressed text indexes.
This paper is devoted to alternative implementations and extensions of
rank and select dictionaries. First, we show that one can use gap
encoding techniques to obtain constant time rank and select queries in
essentially the same space as what is achieved by the best current
direct solution. Second, we extend symbol rank and select to
substring rank and select, giving several space/time tradeoffs
for the problem. An application of these queries is in
position-restricted substring searching, where one can specify
the range in the text where the search is restricted to, and only
occurrences residing in that range are to be reported. In addition,
arbitrary occurrences are reported in text position order.
Several byproducts of our results display connections with searchable
partial sums, Chazelle's two-dimensional data structures, and Grossi
et al.'s wavelet trees.