A Compact Space Decomposition for Effective Metric Indexing

Edgar Chávez and Gonzalo Navarro

The metric space model abstracts many proximity search problems, from nearest-neighbor classifiers to textual and multimedia information retrieval. In this context, an index is a data structure that speeds up proximity queries. However, indexes lose their efficiency as the intrinsic data dimensionality increases. In this paper we present a simple index called list of clusters (LC), which is based on a compact partitioning of the data set. The LC is shown to require little space, to be suitable both for main and secondary memory implementations, and most importantly, to be very resistant to the intrinsic dimensionality of the data set. In this aspect our structure is unbeaten. We finish with a discussion of the role of unbalancing in metric space searching, and how it permits trading memory space for construction time.