Compressed Text Indexes: From Theory to Practice
Paolo Ferragina, Rodrigo González, Gonzalo Navarro, and Rossano
Venturini
A compressed full-text self-index represents a text in a
compressed form and still answers queries efficiently. This
represents a significant advancement over the (full-)text indexing
techniques of the previous decade, whose indexes required several
times the size of the text. Although it is relatively new, this
algorithmic technology has matured up to a point where theoretical
research is giving way to practical developments. Nonetheless this
requires significant programming skills, a deep engineering effort,
and a strong algorithmic background to dig into the research
results. To date only isolated implementations and focused
comparisons of compressed indexes have been reported, and they
missed a common API, which prevented their re-use or deployment
within other applications.
The goal of this paper is to fill this gap. First, we present the
existing implementations of compressed indexes from a practitioner's
point of view. Second, we introduce the Pizza&Chili site,
which offers tuned implementations and a standardized API for the
most successful compressed full-text self-indexes, together with
effective test-beds and scripts for their automatic validation and
test. Third, we show the results of our extensive experiments on
these codes with the aim of demonstrating the practical relevance of
this novel algorithmic technology.