Computing MEMs on Repetitive Text Collections
We consider the problem of computing the Maximal Exact Matches (MEMs) of a given
pattern P[1..m] on a large repetitive text collection T[1..n],
which is represented as a (hopefully much smaller) run-length context-free
grammar of size g_rl. We show that the problem can be solved in time
O(m^2 log^e n), for any constant e > 0, on a data structure
of size O(g_rl). Further, on a locally consistent grammar of size
O(delta log (n/delta)), the time decreases to
O(m log m (log m + log^e n)).
The value delta is a function of the substring
complexity of T and Omega(delta log (n/delta)) is a tight lower
bound on the compressibility of repetitive texts T, so our structure has
optimal size in terms of n and delta.