Improved Compressed String Dictionaries
Nieves Brisaboa, Ana Cerdeira, Guillermo de Bernardo, and Gonzalo Navarro
We introduce a new family of compressed data structures
to efficiently store and query large string dictionaries in main
memory. Our main technique is a combination of hierarchical
Front-coding with ideas from longest-common-prefix computation in
suffix arrays. Our data structures
yield relevant space-time tradeoffs in real-world dictionaries. We focus
on two domains where string dictionaries are extensively used and efficient
compression is required: URL collections, a key element in Web graphs and
applications such as Web mining; and collections of URIs and literals,
the basic components of RDF datasets. Our experiments show that our data
structures achieve better compression than the state-of-the-art alternatives
while providing very competitive query times.