XXS: Efficient XPath Evaluation on Compressed XML Documents.
Nieves Brisaboa, Ana Cerdeira-Pena, and Gonzalo Navarro
The eXtensible Markup Language (XML) is acknowledged as the de
facto standard
for semi-structured data
representation and data exchange on the Web and many other scenarios. A
well-known shortcoming of XML
is its verbosity, which increases manipulation, transmission, and processing
costs. Various structure-blind
and structure-conscious compression techniques can be applied to XML, and some
are even access-friendly,
meaning that the documents can be efficiently accessed in compressed form.
Direct access is necessary to
implement the query languages XPath and XQuery, which are the standard ones to
exploit the expressiveness
of XML. While a good deal of theoretical and practical proposals exist to
solve XPath/XQuery operations
on XML, only a few ones are well integrated with a compression format that
supports the required access
operations on the XML data. In this work we go one step further and design a
compression format for XML
collections that boosts the performance of XPath queries on the data. This is
done by designing compressed
representations of the XML data that support some complex operations apart
from just accessing the data,
and those are exploited to solve key components of the XPath queries. Our
system, called XXS, is aimed at
XML collections containing natural language text, which are compressed to
within 35%-50% of their original
size while supporting a large subset of XPath operations in time competitive
with, and many times outperforming,
the best state-of-the-art systems that work on uncompressed representations.