Compressed Full-Text Indexes
Gonzalo Navarro and Veli Mäkinen
Full-text indexes provide fast substring search over large text collections.
A serious problem of these indexes has traditionally been their space
consumption. A recent trend is to develop indexes that exploit the
compressibility of the text, so that their size is a function of the compressed
text length. This concept has evolved into self-indexes, which in addition
contain enough information to reproduce any text portion, so they replace
the text. The exciting possibility of an index that takes space close to that
of the compressed text, replaces it, and in addition provides fast search over
it, has triggered a wealth of activity and produced surprising results in a very
short time, which radically changed the status of this area in less than five
years. The most successful indexes nowadays are able to obtain almost optimal
space and search time simultaneously.
In this paper we present the main concepts underlying self-indexes. We explain
the relationship between text entropy and regularities that show up in index
structures and permit compressing them. Then we cover the most relevant
self-indexes up to date, focusing on how they exploit text compressibility
to achieve compact structures that can efficiently solve various search
problems. Our aim is to give the background to understand and follow
the developments in this area.