A Probabilistic Spell for the Curse of Dimensionality

Edgar Chávez and Gonzalo Navarro

Range searches in metric spaces can be very difficult if the space is "high dimensional", i.e. when the histogram of distances has a large mean and a small variance. The so-called "curse of dimensionality" (well known in vector spaces) is also observed in metric spaces, and the term refers to the odd situation where using an index for proximity searching may be worse (in total elapsed time) than an exhaustive search. There are at least two reasons behind the curse of dimensionality: a large search radius and/or a high intrinsic dimension of the metric space. We present a general probabilistic framework that allows us to reduce the effective search radius of the search and which gets more effective as the dimension grows. The technique is based on probabilistically stretching the triangle inequality. We apply the technique to a particular class of indexing algorithms, even though it could be generalized to any indexing algorithm using the triangle inequality. We present an approximate analysis of the technique which helps understand the process, as well as empirical evidence showing dramatic improvements in the search time at the cost of a very small error probability.