Improved and Extended Locating Functionality on Compressed Suffix Arrays

Simon Gog, Gonzalo Navarro, and Matthias Petri

Compressed Suffix Arrays (CSAs) offer the same functionality as classical suffix arrays (SAs), and more, within space close to that of the compressed text, and in addition they can reproduce any text fragment. Furthermore, their pattern search times are comparable to those of SAs. This combination has made CSAs extremely successful substitutes for SAs on space-demanding applications. Their weakest point is that they are orders of magnitude slower when retrieving the precise positions of pattern occurrences. SAs have other well-known shortcomings, inherited by CSAs, such as not retrieving those positions in a useful order. In this paper we present new techniques that, on the one hand, improve the current space/time tradeoffs for retrieving pattern occurrences in CSAs, and on the other, efficiently support extended pattern locating functionalities, such as retrieving occurrences in text order or limiting the occurrences to within a text window. Our experimental results display considerable savings with respect to the baseline techniques in many cases of interest: in some cases we outperform their time by a factor of two or three.