Grammar-Compressed Indexes with Logarithmic Search Time
Francisco Claude, Gonzalo Navarro, and Alejandro Pacheco
Let a text T[1..n] be the only string generated by a context-free grammar
with g (terminal and nonterminal) symbols, and of size G (measured as the
sum of the lengths of the right-hand sides of the rules). Such a grammar, called
a grammar-compressed representation of T, can be encoded using
G lg G bits. We introduce the first grammar-compressed index that uses
O(G lg n) bits (precisely, G lg n + (2+e) G lg g for any constant
e > 0) and can find the occ occurrences of patterns
P[1..m] in time
O((m^2+occ) lg G). We implement the index and demonstrate its practicality
in comparison with the state of the art, on highly repetitive text collections.