A Compact Space Decomposition for Effective Metric Indexing
Edgar Chávez and Gonzalo Navarro
The metric space model abstracts many proximity search problems, from
nearest-neighbor classifiers to textual and multimedia information retrieval.
In this context, an index is a data structure that speeds up proximity
queries. However, indexes lose their efficiency as the intrinsic data
dimensionality increases. In this paper we present a simple index called
list of clusters (LC), which is based on a compact partitioning of the
data set. The LC is shown to require little space, to be suitable both for main
and secondary memory implementations, and most importantly, to be very resistant
to the intrinsic dimensionality of the data set. In this aspect our structure is
unbeaten. We finish with a discussion of the role of unbalancing in metric space
searching, and how it permits trading memory space for construction time.