Proximal Nodes: A Model to Query Document Databases by Contents and Structure

Gonzalo Navarro and Ricardo Baeza-Yates

A model to query document databases by both their content and structure is presented. The goal is to obtain a query language which is expressive in practice while being efficiently implementable, features not present at the same time in previous work. The key ideas of the model are a set-oriented query language based on operations on nearby structure elements of one or more hierarchies, together with content and structural indexing and bottom-up evaluation. The model is evaluated regarding expressiveness and efficiency, showing that it provides a good trade-off between both goals. Finally, it is shown how to include in the model other media different from text.