Worst-Case Optimal Similarity Joins on Graph Databases
Diego Arroyuelo, Benjamin Bustos, Adrián Gómez-Brandón,
Aidan Hogan, Gonzalo Navarro, Juan Reutter
We extend the concept of worst-case optimal equijoins in graph
databases to the case where some nodes are required to be within
the k-nearest neighbors (k-NN) of others under some similarity
function. We model the problem by superimposing the database
graph with the k-NN graph and show that a variant of Leapfrog
TrieJoin (LTJ) implemented over a compact data structure called
the Ring can be seamlessly extended to integrate similarity clauses
with the equijoins in the LTJ query process, retaining worst-case
optimality in many relevant cases. Our experiments on a benchmark
that combines Wikidata and IMGpedia show that our enhanced LTJ
algorithm outperforms by a considerable margin a baseline that
first applies classic LTJ and then completes the query by applying
the similarity predicates. The difference is more pronounced on
queries where the similarity clauses are more densely connected to
the query, becoming of an order of magnitude in some cases.