Improving the Space Cost of k-NN Search in Metric Spaces by Using Distance Estimators

Benjamin Bustos and Gonzalo Navarro

Similarity searching in metric spaces has a vast number of applications in several fields like multimedia databases, text retrieval, computational biology, and pattern recognition. In this context, one of the most important similarity queries is the k nearest neighbor (k-NN) search. The standard best-first k-NN algorithm uses a lower bound on the distance to prune objects during the search. Although optimal in several aspects, the disadvantage of this method is that its space requirements for the priority queue that stores unprocessed clusters can be linear in the database size. Most of the optimizations used in spatial access methods (for example, pruning using MinMaxDist) cannot be applied in metric spaces, due to the lack of geometric properties. We propose a new k-NN algorithm that uses \emph{distance estimators}, aiming to reduce the storage requirements of the search algorithm. The method stays optimal, yet it can significantly prune the priority queue without altering the output of the query. Experimental results with synthetic and real datasets confirm the reduction in storage space of our proposed algorithm, showing savings of up to 80\% of the original space requirement.