A Lempel-Ziv Compressed Structure for Document Listing
Héctor Ferrada and Gonzalo Navarro
Document listing is the problem of preprocessing a set of sequences, called
documents, so that later, given a short string called the pattern, we retrieve
the documents where the pattern appears. While optimal-time and
linear-space solutions exist, the current emphasis is in reducing the space
requirements. Current document listing solutions build on compressed suffix
arrays. This paper is the first attempt to solve the problem using a
Lempel-Ziv compressed index of the text collections. We show that the
resulting
solution is very fast to output most of the resulting documents, taking more
time for the final ones. This makes this index particularly useful for
interactive scenarios or when listing some documents is sufficient. Yet, it
also offers a competitive space/time tradeoff when returning the full answers.