Run-Length FM-index

Veli Mäkinen and Gonzalo Navarro

The FM-index is a succinct text index needing only O(H_k n) bits of space, where n is the text size and H_k is the kth order entropy of the text. Hidden in the sublinear factor lies an exponential dependence on the alphabet size, s. In this paper we show how the same ideas can be used to obtain an index needing O(H_k n) bits of space, with the constant factor depending only logarithmically on s. Our space complexity becomes better as soon as s log s > log n, which means in practice for all but very small alphabets, even with huge texts. We retain the same search complexity of the FM-index.