Dynamic Entropy-Compressed Sequences and Full-Text Indexes

Veli Mäkinen and Gonzalo Navarro

Given a sequence of n bits with binary zero-order entropy H0, we present a dynamic data structure that requires nH0 + o(n) bits of space, which is able of performing rank and select, as well as inserting and deleting bits at arbitrary positions, in O(log n) time (amortized for insertions and deletions), or fully worst-case time O(log^2 n). This extends previous results by Kon et al. [ISAAC 2003] achieving O(log n / log log n) time but requiring n+o(n) bits of space, and by Raman et al. [SODA 2002] which have constant query time but a static structure. Our result becomes the first entropy-bound dynamic data structure for rank and select over bit sequences.

We then show how the above result can be used to build a dynamic full-text self-index for a collection of texts over an alphabet of size s, of overall length n and zero-order entropy H0. The index requires nH0 + o(n log s) bits of space, and can count the number of occurrences of a pattern of length m in time O(m log n log s). Reporting the occ occurrences can be supported in O(occ log^2 n log s) time, paying O(n) extra space. Insertion of text to the collection takes O(log n log sigma) amortized time per symbol, which becomes O(log^2 n log s) for deletions. This improves a previous result by Chan et al. [CPM 2004]. As a consequence, we obtain an O(n log n log s) time construction algorithm for a compressed self-index requiring nH0 + o(n log s) bits working space during construction.