On-line Approximate String Matching with Bounded Errors
Marcos Kiwi, Gonzalo Navarro, and Claudio Telha
We introduce a new dimension to the widely studied on-line approximate string
matching problem, by introducing an {\em error threshold} parameter e
so that the algorithm is allowed to miss occurrences with probability
e. This is particularly appropriate for this problem, as approximate
searching is used to model many cases where exact answers are not mandatory.
We show that the relaxed version of the problem allows us breaking the
average-case optimal lower bound of the classical problem, achieving average
case O(n log_s(m)/m) time with any e = poly(k/m), where
n is the text size, m the pattern length, k the number of
errors for edit distance, and s the alphabet size. Our experimental
results show the practicality of this novel and promising research direction.