Faster Run-Length Compressed Suffix Arrays

Nathaniel K. Brown, Travis Gagie, Giovanni Manzini, Gonzalo Navarro, and Marinella Sciortino

We first review how we can store a run-length compressed suffix array (RLCSA) for a text T of length n over an alphabet of size s whose Burrows-Wheeler Transform (BWT) consists of r runs in O(r log (n/r) + r log s + s) bits such that later, given character a and the suffix-array (SA) interval for P, we can find the SA interval for a P in O (log r_a + log log n) time, where r_a is the number of runs of copies of a in the BWT. We then show how to modify the RLCSA such that we find the SA interval for a P in only O (log r_a) time, without increasing its asymptotic space bound. Our key idea is applying a result by Nishimoto and Tabei (ICALP 2021) and then replacing rank queries on sparse bitvectors by a constant number of select queries. We also review two-level indexing and discuss how our faster RLCSA may be useful in improving it. Finally, we briefly discuss how two-level indexing may speed up a recent heuristic for finding maximal exact matches of a pattern with respect to an indexed text.