Fast Compressed Self-Indexes with Deterministic Linear-Time Construction
Ian Munro, Gonzalo Navarro, and Yakov Nekrich
We introduce a compressed suffix array representation that, on a text
T of length n over an alphabet of size s, can be built in
O(n)
deterministic time, within O(n log s) bits of working space, and counts
the number of occurrences of any pattern P in T in time
O(|P| + log log_w s) on a RAM machine of w = Omega(log n)-bit
words. This new index outperforms all the other compressed indexes that can
be built in linear deterministic time, and some others. The only
faster indexes can be built in linear time only in expectation, or require
Theta(n log n) bits.
We also show that, by using O(n log s) bits, we can build in
linear time an index that counts in time O(|P|/log_s n +
log n (log log n)^2), which is RAM-optimal for w = Theta(log n) and
sufficiently long patterns.