Multiple Approximate String Matching by Counting

Gonzalo Navarro

We present a very simple and efficient algorithm for online multiple approximate string matching. It uses a previously known counting-based filter that searches for a single pattern by quickly discarding uninteresting parts of the text. Our multi-pattern algorithm is based on the simulation of many parallel filters using bits of the computer word. Our average complexity to search r patterns of length m is O (rnlog m / log n), being n is the text size. We can search patterns of different length, each one with a different number of errors. We show experimentally that our algorithm is competitive with the fastest known algorithms, being the fastest for a wide range of intermediate error ratios. We give the first average-case analysis of the filtering efficiency of the counting method, applicable also to single patterns.