FM-KZ: An Even Simpler Alphabet-Independent FM-Index
Rafal Przywarski, Szymon Grabowski, Gonzalo Navarro, and Alejandro Salinger
In an earlier work we presented a simple FM-index variant, based on the idea
of Huffman-compressing the text and then applying the Burrows-Wheeler
transform over it. The main drawback of using Huffman was its lack of
synchronizing properties, forcing us to supply another bit stream indicating
the Huffman codeword boudaries. In this way, the resulting index needed
O(n(H0+1)) bits of space but with the constant 2 (concerning the main
term). There are several options aiming to mitigate the overhead in space,
with various effects on the query handling speed. In this work we propose
Kautz-Zeckendorf coding as a both simple and practical replacement for Huffman.
We dub the new index FM-KZ. We also present an efficient implementation of
the rank operation, which is the main building brick of the FM-KZ. Experimental
results show that our index provides an attractive space/time tradeoff in
comparison with existing succinct data structures, and in the DNA test
it even wins both in search time and space use. An additional asset of our
solution is its relative simplicity.