###
Proximity Searching in High Dimensional Spaces with a Proximity Preserving
Order

####
Edgar Chávez, Karina Figueroa, and Gonzalo Navarro

Kernel based methods (such as *k*-nearest neighbors classifiers) for AI tasks
translate the classification problem into a proximity search problem, in a
space that is usually very high dimensional. Unfortunately, no proximity
search algorithm does well in high dimensions. An alternative to overcome this
problem is the use of approximate and probabilistic algorithms, which trade
time for accuracy.
In this paper we present a new probabilistic proximity search algorithm. Its
main idea is to order a set of samples based on their distance to each element.
It turns out that the closeness between the order produced by an element and
that produced by the query is an excellent predictor of the relevance of the
element to answer the query.

The performance of our method is unparalleled. For example, for a *full*
128-dimensional dataset, it is enough to review 10% of the database to obtain
90% of the answers, and to review less than 1% to get 80% of the correct
answers. The result is more impressive if we realize that a full 128-dimensional
dataset may span thousands of dimensions of clustered data.
Furthermore, the concept of proximity preserving order opens a totally new
approach for both exact and approximated proximity searching.