Computing MEMs and Relatives on Repetitive Text Collections
Gonzalo Navarro
We consider the problem of computing the Maximal Exact Matches (MEMs) of a given
pattern P[1..m] on a large repetitive text collection T[1..n]
over an alphabet of size σ, which is represented as a (hopefully
much smaller) run-length context-free grammar of size g_rl. We show that
the problem can be solved in time O(m^2 log^e n), for any constant
e > 0, on a data structure of size O(g_rl). Further, on a
locally consistent grammar of size O(δ log (n log σ / (δ
log n))), the time decreases to O(m log m (log m + log^e n)).
The value δ is a function of the substring complexity of T
and Omega(δ log (n log σ / (δ log n))) is a tight
lower bound on the compressibility of repetitive texts T, so our
structure has optimal size in terms of n, σ, and
δ. We extend our results to several related problems, such as
finding k-MEMs, MUMs, rare MEMs, and applications.