Fast In-Memory XPath Search using Compressed Indexes
Diego Arroyuelo, Francisco Claude, Sebastian Maneth, Veli Mäkinen,
Gonzalo Navarro, Kim Nguyễn, Jouni Sirén, and Niko
Välimäki
XML documents consist of text data plus structured data
(mark-up).
XPath allows to query both, text and structure.
Evaluating such hybrid queries is challenging.
We present a system for in-memory evaluation
of XPath search queries, that is, queries with
text and structure predicates, yet without advanced features
such as backward axes, arithmetics, and joins.
We show that for this query fragment, which contains
Forward Core XPath, our system, dubbed
"SXSI", outperforms existing systems by 1-3 orders
of magnitude.
SXSI is based on state-of-the-art indexes for
text and structure data.
It combines two novelties. On one hand, it represents the
XML data in a compact indexed form, which allows it to handle larger
collections in main memory while supporting powerful search and navigation
operations over the text and the structure. On the other, it features an
execution engine that uses tree automata and
cleverly chooses evaluation orders that leverage the
speeds of the respective indexes.
SXSI is modular and allows seamless replacement
of its indexes. This is demonstrated through
experiments with (1) a text index specialized for
search of bio sequences, and
(2) a word-based text index specialized for
natural language search.