Multiple Approximate String Matching
Ricardo Baeza-Yates and Gonzalo Navarro
We present two new algorithms for on-line multiple approximate string matching.
These are extensions of previous algorithms that search for a single pattern.
The single-pattern version of the first one is based on the simulation with bits
of a non-deterministic
finite automaton built from the pattern and using the text as input.
To search for multiple patterns, we superimpose their automata, using the
result as a filter.
The second algorithm partitions the pattern in sub-patterns that are searched
with no errors, with a fast exact multipattern search algorithm. To handle
multiple patterns, we search the sub-patterns of all of them together.
The running time achieved is in both cases O (n) for moderate error level,
pattern length and number of patterns.
They adapt (with higher costs) to the other cases.
However, the algorithms differ in speed and thresholds of usefulness.
We analyze theoretically when the algorithms can be used, and show
experimentally that they are faster than previous solutions in a wide
range of cases.