Tree Path Majority Data Structures

Travis Gagie, Meng He, Gonzalo Navarro, and Carlos Ochoa

We present the first solution to finding t-majorities on tree paths. Given a tree of n nodes, each with a label from [1..s], and a fixed threshold 0<t<1, such a query gives two nodes u and v and asks for all the labels that appear more than t |P_uv| times in the path P_uv from u to v, where |P_uv| denotes the number of nodes in P_uv. Note that the answer to any query is of size up to 1/t. On a w-bit RAM, we obtain a linear-space data structure with O((1/t) log log_w s) query time, which is worst-case optimal for polylogarithmic-sized alphabets. We also describe two succinct-space solutions with query time O((1/t) log* n log log_w s). One uses 2nH + 4n + o(n)(H+1) bits, where H ≤ lg s is the entropy of the label distribution; the other uses nH + O(n) + o(nH) bits. By using just o(n log s) extra bits, our succinct structures allow t to be specified at query time. We obtain analogous results to find a t-minority, that is, an element that appears between 1 and t |P_uv| times in P_uv.