A Probabilistic Spell for the Curse of Dimensionality
Edgar Chávez and Gonzalo Navarro
Range searches in metric spaces can be very difficult if the space is "high
dimensional", i.e. when the histogram of distances has a large mean and a
small variance. The so-called "curse of dimensionality" (well known in vector
spaces) is also observed in metric spaces, and the term refers to the odd
situation where using an index for proximity searching may be worse (in total
elapsed time) than an exhaustive search.
There are at least two reasons behind the curse of dimensionality: a large
search radius and/or a high intrinsic dimension of the metric space.
We present a general probabilistic framework that allows us to reduce the
effective search radius of the search and which gets more effective as the
dimension grows. The technique is based on probabilistically stretching
the triangle inequality. We apply the technique to a particular class of
indexing algorithms, even though it could be generalized to any indexing
algorithm using the triangle inequality. We present an approximate analysis of
the technique which helps understand the process, as well as empirical evidence
showing dramatic improvements in the search time at the cost of a very small
error probability.