Compressed Dynamic Range Majority and Minority Data Structures

Travis Gagie, Meng He, and Gonzalo Navarro

In the range a-majority query problem, we are given a sequence S[1..n] and a fixed threshold a in (0,1), and are asked to preprocess S such that, given a query range [i..j], we can efficiently report the symbols that occur more than a (j-i+1) times in S[i..j], which are called the range a-majorities. In this article we describe the first compressed dynamic data structure for range a-majority queries. It represents S in compressed space --- n H_k + o(n lg s) bits for any k = o(lg_s n), where s is the alphabet size and H_k ≤ H_0 ≤ lg s is the kth order empirical entropy of S --- and answers queries in O((lg n) / (a lg lg n)) time while supporting insertions and deletions in S in O((lg n) / a) amortized time. We then show how to modify our data structure to receive some b ≥ a at query time and report the range b-majorities in O((lg n) / (b lg lg n)) time, without increasing the asymptotic space or update-time bounds. The best previous dynamic solution has the same query and update times as ours, but it occupies O(n) words and cannot take advantage of being given a larger threshold b at query time. We also design the first dynamic data structure for range a-minority --- i.e., find a non-a-majority that occurs in a range --- and obtain space and time bounds similar to those for a-majorities. We extend the structure to find Theta(1/a) a-minorities at the same space and time cost. By giving up updates, we obtain static data structures with query time O((1/a) lg lg_w s) for both problems, on a RAM with word size w = Omega(lg n) bits, without increasing our space bound. Static alternatives reach time O(1/a), but they compress S only to zeroth order entropy (H_0) or they only handle small values of a, that is, lg(1/a) = o(lg s).