Approximate String Matching on Ziv-Lempel Compressed Text
Juha Kärkkäinen, Gonzalo Navarro and Esko Ukkonen
We present the first nontrivial algorithm for approximate pattern matching on
compressed text. The format we choose is the Ziv-Lempel family. Given a text of
length u compressed into length n, and a pattern of length m, we report
all the R occurrences of the pattern in the text allowing up to k
insertions, deletions and substitutions. On LZ78/LZW we need O(mkn + R) time
in the worst case and O(k^2n + mk\min(n,(ms)^k) + R) on average where
s is the alphabet size. The experimental results show a practical
speedup over the basic approach of up to 2X for moderate m and small k.
We extend the algorithms to more general compression formats and approximate
matching models.