Fast Two-Dimensional Approximate Pattern Matching

Ricardo Baeza-Yates and Gonzalo Navarro

We address the problem of approximate string matching in two dimensions, that is, to find a pattern of size m x m in a text of size n x n with at most k errors (substitutions, insertions and deletions). Although the problem can be solved using dynamic programming in time O (m^2 n^2), this is in general too expensive for small k. So we design a filtering algorithm which avoids verifying most of the text with dynamic programming. This filter is based on a one-dimensional multi-pattern approximate search algorithm. The average complexity of our resulting algorithm is O( n^2k log_sigma (m)/m^2) for k < m(m+1)/(5log_sigma m), which is optimal and matches the best previous result which allows only substitutions. For higher error levels, we present an algorithm with time complexity O (n^2k/(wsqrt(sigma))) (where w is the size in bits of the computer word and sigma the alphabet size). This algorithm works for k < m(m+1)(1-e/sqrt(sigma)), where e = 2.718.... These are the first good expected case algorithms for the problem. Our algorithms work also for rectangular patterns and rectangular text and can even be extended to the case where each row in the pattern and the text have a different length.