###
Speeding up Spatial Approximation Search in Metric Spaces

####
Karina Figueroa, Edgar Chávez, Gonzalo Navarro, and Rodrigo Paredes

Proximity searching consists in retrieving from a database those elements that
are similar to a query object. The usual model for proximity searching is a
metric
space where the distance, which models the proximity, is expensive to compute.
An index uses precomputed distances to speed up query processing. Among all
the
known indices, the baseline for performance for about twenty years has been
AESA. This index uses an iterative procedure, where at each iteration it first
chooses the next *promising* element ("pivot") to compare to the query,
and then it discards database elements that can be proved not relevant to the
query using the pivot. The next pivot in AESA is chosen as the one minimizing
the sum of lower bounds to the distance to the query proved by previous
pivots.
In this paper we introduce the new index *iAESA*, which establishes a new
performance baseline for metric space searching. The difference with AESA is
the method to select the next pivot. In iAESA, each candidate sorts previous
pivots by closeness to it, and chooses the next pivot as the candidate whose
order is most similar to that of the query. We also propose a modification to
AESA-like algorithms to turn them into probabilistic algorithms.
Our empirical results confirm a consistent improvement in query performance.
For example, we perform as few as 60% of the distance evaluations of AESA in
a
database of documents, a very important and difficult real-life instance of
the
problem. For the probabilistic algorithm, we perform in a database of faces up
to 40% of the comparisons made by the best alternative algorithm to retrieve
the same percentage of the correct answer. Based on the empirical results we
conjecture that the new probabilistic AESA-like algorithms will become, as
AESA
had been for exact algorithms, a reference point establishing in practice a
lower bound on how good a probabilistic proximity search algorithm can be.