Bidirectional r-indexes
Yuma Arakawa, Gonzalo Navarro, and Kunihiko Sadakane
Indexing highly repetitive texts is important in fields such as bioinformatics and versioned
repositories. The run-length compression of the Burrows-Wheeler transform (BWT) provides a
compressed representation particularly well-suited to text indexing. The r-index is one such index. It
enables fast locating of occurrences of a pattern within O(r) words of
space, where r is the number
of equal-letter runs in the BWT. Its mechanism of locating is to maintain one suffix array sample
along the backward-search of the pattern, and to compute all the pattern positions from that sample
once the backward-search is complete. In this paper we develop this algorithm further, and propose
a new bi-directional text index called the br-index, which supports extending the matched pattern
both in forward and backward directions, and locating the occurrences of the pattern at any step
of the search, within O(r + r_R) words of space, where r_R is the number of equal-letter runs in the
BWT of the reversed text. Our experiments show that the br-index captures the long repetitions of
the text, and outperforms the existing indexes in text searching allowing some mismatches except in
an internal part.