###
Solving Similarity Joins and Range Queries in Metric Spaces with the List of Twin Clusters

####
Rodrigo Paredes and Nora Reyes

The metric space model abstracts many proximity or similarity problems, where
the most frequently considered primitives are range and *k*-nearest neighbor
search, leaving out the *similarity join*, an extremely important primitive.
In fact, despite the great attention that this primitive has received in
traditional and even multidimensional databases, little has been done for
general metric databases.
We solve two variants of the similarity join problem:
(1) *range joins*: Given two sets of objects and a distance threshold *r*,
find all the object pairs (one from each set) at distance at most *r*; and
(2) *k-closest pair joins*: Find the *k* closest object pairs (one from
each set). For this sake, we devise a new metric index, coined
*List of Twin Clusters* (LTC), which indexes both sets jointly,
instead of the natural approach of indexing one or both sets independently.
Finally, we show how to use the LTC in order to solve classical range queries.
Our results show significant speedups over the basic quadratic-time naive alternative
for both join variants, and that the LTC is competitive with the
original *list of clusters* when solving range queries.
Furthermore, we show that our technique has a great potential for improvements.