Compressed Text Indexes with Fast Locate
Rodrigo González and Gonzalo Navarro
Compressed text (self-)indexes have matured up to a point where they can replace
a text by a data structure that requires less space and, in addition to giving
access to arbitrary text passages, support indexed text searches. At this point
those indexes are competitive with traditional text indexes (which are very
large) for counting the number of occurrences of a pattern in the text.
Yet, they are still hundreds to thousands of times slower when it comes to
locating those occurrences in the text. In this paper we introduce a new
compression scheme for suffix arrays which permits locating the occurrences
extremely fast, while still being much smaller than classical indexes. In
addition, our index permits a very efficient secondary memory implementation,
where compression permits reducing the amount of I/O needed to answer queries.