A New Indexing Method for Approximate String Matching

Gonzalo Navarro and Ricardo Baeza-Yates

We present a new indexing method for the approximate string matching problem. The method is based on a suffix tree combined with a partitioning of the pattern. We analyze the resulting algorithm and show that the retrieval time is O(n^l) for 0 whenever a < 1-e/sqrt(s) where a is the error level tolerated and s is the alphabet size. We experimentally show that this index outperforms by far all other algorithms for indexed approximate searching, also being the first experiments that compare the different existing schemes. We finally show how this index can be implemented using much less space.