Practical Compressed Document Retrieval
Gonzalo Navarro, Simon Puglisi, and Daniel Valenzuela
Recent research on document retrieval for general texts has established
the virtues of explicitly representing the so-called document array,
which stores the document each pointer of the suffix array belongs to.
While it makes document retrieval faster, this array occupies a
significative amount of redundant space and is not easily compressible.
In this paper we present the first practical proposal to compress the
document array. We show that the resulting structure is significatively
smaller than the uncompressed counterpart, and than alternatives to the
document array proposed in the literature. We also compare various known
algorithms for document listing and top-k retrieval,
and find that the most useful combinations of algorithms run over our new
compressed document arrays.