###
A Metric Index for Approximate String Matching

####
Gonzalo Navarro and Edgar Chávez

We present a radically new indexing approach for approximate string matching.
The scheme uses the metric properties of the edit distance and can be applied
to any other metric between strings. We build a metric space where the sites
are the nodes of the suffix tree of the text, and the approximate query is seen
as a proximity query on that metric space. This permits us finding the *occ*
occurrences of a pattern of length *m*, permitting up to *r* differences, in a
text of length *n* over an alphabet of size *s*, in average time
*O(m^(1+e) + occ)* for any *e>0*, if *r = o(m/log_s(m))*
and *m > ((1+e)/e)log_s(n)*. The index works well up to
*r < (3-sqrt(2))m/log_s(m)*, where it achieves its maximum average
search complexity *O(m^(1+sqrt(2)+e)+occ)*. The construction time of
the index is *O(m^(1+sqrt(2)+e) n log n)* and its space is
*O(m^(1+sqrt(2)+e) n)*. This is the first index achieving average
search time polynomial in *m* and independent of *n*, for *r = O(m/log_s(m))*. Previous methods achieve this complexity only for *r=O(m/log_s(n))*.
We also present a simpler scheme needing *O(n)* space.