A Simple Alphabet-Independent FM-Index
Szymon Grabowski, Veli Mäkinen, Gonzalo Navarro, and Alejandro Salinger
We design a succinct full-text index based on the idea of Huffman-compressing
the text and then applying the Burrows-Wheeler transform over it. The resulting
structure can be searched as an FM-index, with the benefit of removing the sharp
dependence on the alphabet size, s, present in that structure. On a text
of length n with zero-order entropy H0, our index needs
O(n(H0+1)) bits of space, without any dependence on s. The
average search time for a pattern of length m is O(m(H0+1)),
under reasonable assumptions. Each position of a text occurrence can be
reported in worst case time O((H0+1)\log n), while any text substring
of length L can be retrieved in O((H0+1)L) average time in
addition to the previous worst case time.
Our index provides a relevant space/time tradeoff between existing succinct
data structures, with the additional interest of being easy to implement.
Our experimental results show that, although not among the most succinct, our
index is faster than the others in many aspects, even letting them use
significatively more space.