Improved and Extended Locating Functionality on Compressed Suffix Arrays
Simon Gog, Gonzalo Navarro, and Matthias Petri
Compressed Suffix Arrays (CSAs) offer the same functionality as classical
suffix arrays (SAs), and more, within space close to that of the compressed
text, and in addition they can reproduce any text fragment. Furthermore, their
pattern search
times are comparable to those of SAs. This combination has made CSAs extremely
successful substitutes for SAs on space-demanding applications. Their weakest
point is that they are orders of magnitude slower when retrieving the precise
positions of pattern occurrences. SAs have other well-known shortcomings,
inherited by CSAs, such as not retrieving those positions in a useful order.
In this paper we present new techniques that, on the one hand, improve the
current
space/time tradeoffs for retrieving pattern occurrences in CSAs, and on the
other, efficiently support extended pattern locating functionalities, such as
retrieving occurrences in text order or limiting the occurrences to within a
text
window. Our experimental results display considerable savings with respect to
the baseline techniques in many cases of interest: in some cases we outperform
their time by a factor of two or three.