Improved Grammar-Based Compressed Indexes
Francisco Claude and Gonzalo Navarro
We introduce the first grammar-compressed representation of a sequence
that supports searches in time that depends only logarithmically on
the size of the grammar. Given a text T[1..u] that is represented by
a (context-free) grammar of n (terminal and nonterminal) symbols and
size N (measured as the sum of the lengths of the right hands of the
rules), a basic grammar-based representation of T takes N lg n
bits of space. Our representation requires 2N lg n + N lg u + eps
n lg n + o(N lg n) bits of space, for any 0. It can
find the positions of the occ occurrences of a pattern of length
m in T in O((m^2/eps) lg (lg u / lg n) +
(m+occ) lg n) time, and extract any substring of length l of
T in time O(l+h lg(N/h)), where h is the height of the grammar tree.