Approximate String Matching over Ziv-Lempel Compressed Text
Juha Kärkkäinen, Gonzalo Navarro and Esko Ukkonen
We present the first solution to the open problem of performing approximate
pattern matching on compressed text. The format we choose is the Ziv-Lempel
family, specifically the LZ78 and LZW variants. Given a text of length u
compressed into length n, and a pattern of length m, we report
all the R occurrences of the pattern in the text allowing up to k
insertions, deletions and substitutions, in O(mkn + R) time. The
existence problem needs O(mkn) time. We also show that the algorithm can
be adapted to run in O(k^2n + min(mkn,m^2(ms)^k) + R) average time,
where s is the alphabet size. The experimental results show a
speedup over the basic approach for moderate m and small k.