###
Ranked Document Selection

####
Ian Munro, Gonzalo Navarro, and Sharma Thankachan

Let *D* be a collection of string
documents of *n* characters in total. The *top-k* document retrieval
problem} is to preprocess *D* into a data structure that, given a query
*(P,k)*, can return the *k* documents of *D* most relevant to
pattern *P*.
The relevance of a document *d* for a pattern *P* is given by a predefined
ranking function *w(P,d)*.
Linear space and optimal query time solutions already exist for this problem.
In this paper we consider a novel problem, *document selection*,
in which a query *(P,k)* aims to report the *k*th document most relevant to
*P* (instead of
reporting all top-*k* documents).
We present a data structure using *O(n log^e n)* space, for any
constant *e > 0*, answering selection queries in time *O(log k /
log log n)*, and a linear-space data structure answering queries in time
*O(log k)*, given the locus node of *P* in a (generalized) suffix tree of
*D*. We also prove that it is unlikely that a succinct-space solution for
this problem exists with poly-logarithmic query time, and that
*O(log k / log log n)* is indeed optimal within *O(n polylog
n)* space for
most text families. Finally, we present some additional space-time
trade-offs exploring the extremes of those lower bounds.