Colored Range Queries and Document Retrieval
Travis Gagie, Gonzalo Navarro, and Simon Puglisi
Colored range queries are a well-studied topic in computa-
tional geometry and database research that, in the past decade, have
found exciting applications in information retrieval. In this paper we give
improved time and space bounds for three important one-dimensional
colored range queries -- colored range listing, colored range top-k queries
and colored range counting -- and, thus, new bounds for various document
retrieval problems on general collections of sequences. Specifically,
we first describe a framework including almost all recent results on colored
range listing and document listing, which suggests new combinations
of data structures for these problems. For example, we give the
fastest compressed data structures for colored range listing and document
listing, and an efficient data structure for document listing whose
size is bounded in terms of the high-order entropies of the library of
documents. We then show how (approximate) colored top-k queries can be
reduced to (approximate) range-mode queries on subsequences, yielding
the first efficient data structure for this problem. Finally, we show how
a modified wavelet tree can support colored range counting in logarithmic
time and space that is succinct whenever the number of colors is
superpolylogarithmic in the length of the sequence.