###
Improved Compressed Indexes for Full-Text Document Retrieval

####
Djamal Belazzougui and Gonzalo Navarro

We give new space/time tradeoffs for compressed indexes that answer document
retrieval queries on general sequences. On a collection of *D* documents of
total length *n*, current approaches require at least *|CSA|+
O(n lg D / lg lg D)* or *2|CSA|+o(n)* bits of space, where
*CSA* is
a full-text index. Using monotone minimum perfect hash functions, we give new
algorithms for document listing with frequencies and top-*k* document
retrieval
using just *|CSA|+O(n lg lg lg D)* bits. We also improve current solutions
that use *2|CSA|+o(n)* bits, and consider other problems such as colored
range
listing, top-*k* most important documents, and computing arbitrary
frequencies.