Improving the Space Cost of k-NN Search in Metric Spaces by Using
Distance Estimators
Benjamin Bustos and Gonzalo Navarro
Similarity searching in metric spaces has a vast number of applications in
several fields like multimedia databases, text retrieval, computational
biology, and pattern recognition. In this context, one of the most important
similarity queries is the k nearest neighbor (k-NN) search. The
standard
best-first k-NN algorithm uses a lower bound on the distance to prune
objects during the search. Although optimal in several aspects, the
disadvantage of this method is that its space requirements for the priority
queue that stores unprocessed clusters can be linear in the database size.
Most of the optimizations used in spatial access methods (for example, pruning
using MinMaxDist) cannot be applied in metric spaces, due to the lack of
geometric properties. We propose a new k-NN algorithm that uses
\emph{distance estimators}, aiming to reduce the storage requirements of the
search algorithm. The method stays optimal, yet it can significantly prune the
priority queue without altering the output of the query. Experimental results
with synthetic and real datasets confirm the reduction in storage space of our
proposed algorithm, showing savings of up to 80\% of the original space
requirement.