Fast Two-Dimensional Approximate Pattern Matching
Ricardo Baeza-Yates and Gonzalo Navarro
We address the problem of approximate string matching in two dimensions,
that is, to find a pattern of size m x m in a text of size
n x n with at most k errors (substitutions, insertions
and deletions).
Although the problem can be solved using dynamic programming in time
O (m^2 n^2), this is in general too expensive for small k. So we
design a filtering algorithm which avoids verifying most of the
text with dynamic programming. This filter is based on a one-dimensional
multi-pattern approximate search algorithm. The average complexity of our
resulting algorithm is O( n^2k log_sigma (m)/m^2) for k <
m(m+1)/(5log_sigma m), which is optimal and matches the best previous
result which allows only substitutions. For higher error levels, we present an
algorithm with time complexity O (n^2k/(wsqrt(sigma))) (where
w is the size in bits of the computer word and sigma the alphabet
size). This algorithm works for k < m(m+1)(1-e/sqrt(sigma)),
where e = 2.718....
These are the first good expected case algorithms for the problem.
Our algorithms work also for rectangular patterns and rectangular text and
can even be extended to the case where each row in the pattern and the text
have a different length.