Top-k Document Retrieval in Optimal Time and Linear Space
Gonzalo Navarro and Yakov Nekrich
We describe a data structure that uses O(n)-word space
and reports k most relevant documents that contain a query
pattern P in optimal O(|P|+k) time.
Our construction supports an ample set of important relevance measures,
such as the frequency of P in a document and the minimal distance between
two occurrences of P in a document.
We show how to reduce the space of the data
structure from O(n log n) bits to O(n(log s + log D + log log
n)),
where s is the alphabet size and D is the total number of documents.