Top-k Document Retrieval in Compact Space and Near-Optimal Time
Gonzalo Navarro and Sharma Thankachan
Let DD = {d_1,d_2,...d_D} be a given set of D string
documents of total length n. Our task is to index DD such that the
k most relevant documents for an online query pattern P of length
p can
be retrieved efficiently. There exist linear space data structures of
O(n)
words for answering such queries in optimal O(p+k) time.
In this paper, we describe a compact index of size
|CSA| + n log D + o(n log D) bits with near optimal time,
O(p + k log* n), for the basic relevance metric term-frequency,
where |CSA| is the size (in bits) of a compressed full-text index of
DD,
and log* n is the iterated logarithm of n.