The Ring: Worst-Case Optimal Joins in Graph Databases using (Almost)
No Extra Space
Diego Arroyuelo, Adrián Gómez-Brandón,
Aidan Hogan, Gonzalo Navarro, Juan Reutter, Javiel Rojas-Ledesma, and
Adrián Soto
We present an indexing scheme for triple-based graphs that supports join
queries in worst-case optimal (wco) time within compact space. This scheme,
called a ring, regards each triple as a cyclic string of length 3. Each
rotation of the triples is lexicographically sorted and the values of the last
attribute are stored as a column, so we obtain the order of the next column by
stably re-sorting the triples by its attribute. We show that, by representing
the columns with a compact data structure called a wavelet tree, this ordering
enables forward and backward navigation between columns without needing
pointers. These wavelet trees further support wco join algorithms and
cardinality estimations for query planning. While traditional data structures
such as B-Trees, tries, etc., require 6 index orders to support all possible
wco joins over triples, we can use one ring to index them all. This ring
replaces the graph and uses only sublinear extra space, thus supporting wco
joins in almost no space beyond storing the graph itself. Experiments querying
a large graph (Wikidata) in memory show that the ring offers nearly the best
overall query times while using only a small fraction of the space required by
several state-of-the-art approaches.
We then turn our attention to some theoretical results for indexing tables of
arity d higher than 3 in such a way that supports wco joins. While a
single ring of length d no longer suffices to cover all d! orders,
we need much fewer rings to index them all: O(2^d) rings with a small
constant. For example, we need 5 rings instead of 120 orders for d=5.
We show that our rings become a particular case of what we dub order
graphs, whose nodes are attribute orders and where stably sorting by some
attribute leads us from an order to another, thereby inducing an edge labeled
by the attribute. The index is then the set of columns associated with the
edges, and a set of rings is just one possible graph shape. We show that other
shapes, like for example a single ring instead of several ones of length
d can lead us to even smaller indexes, and that other more general
shapes are also possible. For example, we handle d=5 attributes within
space equivalent to 4 rings.