Computing MEMs on Repetitive Text Collections

Gonzalo Navarro

We consider the problem of computing the Maximal Exact Matches (MEMs) of a given pattern P[1..m] on a large repetitive text collection T[1..n], which is represented as a (hopefully much smaller) run-length context-free grammar of size g_rl. We show that the problem can be solved in time O(m^2 log^e n), for any constant e > 0, on a data structure of size O(g_rl). Further, on a locally consistent grammar of size O(delta log (n/delta)), the time decreases to O(m log m (log m + log^e n)). The value delta is a function of the substring complexity of T and Omega(delta log (n/delta)) is a tight lower bound on the compressibility of repetitive texts T, so our structure has optimal size in terms of n and delta.