Indexing Compressed Text
Edleno de Moura, Gonzalo Navarro and Nivio Ziviani
We present a technique to build an index based on suffix arrays for
compressed texts.
We also propose a compression scheme for textual databases based on words
that generates a compression code that preserves the lexicographical
ordering of the text words.
As a consequence it permits the sorting of the compressed strings to
generate the suffix array without decompressing.
As the compressed text is under 30% of the size of the original text
we are able to build the suffix array twice as fast on the compressed text.
The compressed text plus index is 55-60% of the size of the original text
plus index and search times are reduced to approximately half the time.
We also present analytical and experimental results for different variations
of the word-oriented compression paradigm.