Improved and Extended Locating Functionality on Compressed Suffix Arrays
Simon Gog and Gonzalo Navarro
Compressed Suffix Arrays (CSAs) offer the same functionality as classical
suffix arrays (SAs), and more, within space close to that of the compressed
text, and in addition they replace the text. Furthermore, their pattern search
times are comparable to those of SAs. This combination has made CSAs extremely
successful substitutes for SAs on space-demanding applications. Their weakest
point is that they are orders of magnitude slower when reporting the precise
positions of pattern occurrences. SAs have other well-known shortcomings,
inherited by CSAs, such as retrieving those positions in arbitrary order.
In this paper we present new techniques that, on one hand, improve the current
space/time tradeoffs for locating pattern occurrences on CSAs, and on the
other, efficiently support extended pattern locating functionalities, such as
reporting occurrences in text order or limiting the occurrences to within a
text
window. Our experimental results display considerable savings with respect to
the baseline techniques.