Integrating Contents and Structure in Text Retrieval

Ricardo Baeza-Yates and Gonzalo Navarro

The purpose of a textual database is to store textual documents. These documents have not only textual contents, but also structure. Many traditional text database systems have focused only on querying by contents or by structure. Recently, a number of models integrating both types of queries have appeared. We argue in favor of that integration, and focus our attention on these recent models, covering a representative sampling of the proposals in the field. We pay special attention to the tradeoffs between expressiveness and efficiency, showing the compromises taken by the models. We argue in favor of achieving a good compromise, since being weak in any of these two aspects makes the model useless for many applications.