Improved Single and Multiple Approximate String Matching
Kimmo Fredriksson and Gonzalo Navarro
We present a new algorithm for multiple approximate string matching. It is
based on reading backwards enough l-grams from text windows so as to
prove that no occurrence can contain the part of the window read, and then
shifting the window. Three variants of the algorithm are presented, which
give different tradeoffs between how much they work in the window and how
much they shift it. We show analytically that two of our algorithms are optimal
on average. Compared to the first average-optimal multipattern approximate
string matching algorithm [Fredriksson and Navarro, CPM 2003], the new
algorithms are much faster and are optimal up to difference ratios of 1/2,
contrary to the maximum of 1/3 that could be reached in previous work.
This is also a contribution to the area of single-pattern approximate string
matching, as the only average-optimal algorithm [Chang and Marr, CPM 1994]
also reached a difference ratio of 1/3.
We show experimentally that our algorithms are very competitive, displacing the
long-standing best algorithms for this problem. On real life texts, our
algorithms are especially interesting for computational biology applications.