Proximity Searching in Metric Spaces
Edgar Chávez, Gonzalo Navarro, Ricardo Baeza-Yates and José Luis Marroquín
The problem of searching the elements of a set which are close to a given
query element under some similarity criterion has a vast number of applications
in many branches of computer science, from pattern recognition to textual and
multimedia information retrieval. We are interested in the rather general
case where the similarity criterion defines a metric space, instead
of the more restricted case of a vector space. A large number of solutions have
been proposed in different areas, in many cases without cross-knowledge.
Because of this, the same ideas have been reinvented several times, and very
different presentations have been given for the same approaches.
We present some basic results that explain the intrinsic difficulty of the
search problem. This includes a quantitative definition of the elusive concept
of "intrinsic dimensionality".
We also present a unified view of all the known proposals to organize
metric spaces, so as to be able to understand them under a common framework.
Most approaches turn out to be variations on a few different concepts. We
organize those works in a taxonomy which allows us to devise new algorithms
from combinations of concepts which were not noticed before because of the
lack of communication between different communities. We present
experiments validating our results and comparing the existing approaches.
We finish with recommendations for practitioners and open questions for future
development.