Top-k Document Retrieval in Compact Space and Near-Optimal Time

Gonzalo Navarro and Sharma Thankachan

Let DD = {d_1,d_2,...d_D} be a given set of D string documents of total length n. Our task is to index DD such that the k most relevant documents for an online query pattern P of length p can be retrieved efficiently. There exist linear space data structures of O(n) words for answering such queries in optimal O(p+k) time. In this paper, we describe a compact index of size |CSA| + n log D + o(n log D) bits with near optimal time, O(p + k log* n), for the basic relevance metric term-frequency, where |CSA| is the size (in bits) of a compressed full-text index of DD, and log* n is the iterated logarithm of n.