###
List of Twin Clusters: a Data Structure for Similarity Joins in Metric Spaces

####
Rodrigo Paredes and Nora Reyes

The metric space model abstracts many proximity or similarity problems, where
the most frequently considered primitives are range and *k*-nearest neighbor
search, leaving out the *similarity join*, an extremely important primitive.
In fact, despite the great attention that this primitive has received in
traditional and even multidimensional databases, little has been done for
general metric databases.
We consider a particular type of similarity join: Given two sets of objects
and a distance threshold *r*, find all the object pairs (one from each set)
at distance at most *r*. For this sake, we devise a new metric index, coined
*List of Twin Clusters*, which indexes both sets jointly (instead of the
natural approach of indexing one or both sets independently).
Our results show significant speedups over
the basic quadratic-time naive alternative. Furthermore, we show that
our technique can be easily extended to other similarity join variants,
e.g., finding the *k*-closest pairs.