A Hybrid Indexing Method for Approximate String Matching
Gonzalo Navarro and Ricardo Baeza-Yates
We present a new indexing method for the approximate string matching problem.
The method is based on a suffix array combined with a partitioning of the
pattern. We analyze the resulting algorithm and show that the average retrieval
time is O(n^lambda *log n), for some lambda>0 that depends on the error
fraction tolerated alpha and the alphabet size sigma. It is shown that
lambda<1 for approximately alpha < 1-e/sqrt(sigma), where e=2.718....
The space required is four times the text size, which is quite
moderate for this problem. We experimentally show that this index can
outperform by far all the existing alternatives for indexed approximate
searching. These are also the first experiments that compare the different
existing schemes.