An Effective Clustering Algorithm to Index High Dimensional Metric Spaces
Edgar Chávez and Gonzalo Navarro
A metric space consists of a collection of objects and a distance
function defined among them, which satisfies the triangular inequality. The
goal is to preprocess the set so that, given a set of objects and a query,
retrieve those objects close enough to the query. The number of distances
computed to achieve this goal is the complexity measure. The problem is very
difficult in the so-called high-dimensional metric spaces, where the
histogram of distances has a large mean and a small variance.
A recent survey on methods to index metric spaces has shown that the so-called
clustering algorithms are better suited than their competitors,
pivot-based algorithms, to cope with high-dimensional metric spaces.
In this paper we present a new clustering method that achieves much better
performance than all the existing data structures. We present analytical and
experimental results that support our claims and that give the users the
tuning parameters to make optimal use of this data structure.