Improved and Extended Locating Functionality on Compressed Suffix Arrays

Simon Gog and Gonzalo Navarro

Compressed Suffix Arrays (CSAs) offer the same functionality as classical suffix arrays (SAs), and more, within space close to that of the compressed text, and in addition they replace the text. Furthermore, their pattern search times are comparable to those of SAs. This combination has made CSAs extremely successful substitutes for SAs on space-demanding applications. Their weakest point is that they are orders of magnitude slower when reporting the precise positions of pattern occurrences. SAs have other well-known shortcomings, inherited by CSAs, such as retrieving those positions in arbitrary order. In this paper we present new techniques that, on one hand, improve the current space/time tradeoffs for locating pattern occurrences on CSAs, and on the other, efficiently support extended pattern locating functionalities, such as reporting occurrences in text order or limiting the occurrences to within a text window. Our experimental results display considerable savings with respect to the baseline techniques.