Indexing Text with Approximate q-grams
Gonzalo Navarro, Erkki Sutinen, Jani Tanninen and Jorma Tarhio
We present a new index for approximate string matching. The index collects
text q-samples, i.e. disjoint text substrings of length q, at fixed
intervals and stores their positions. At search time, part of the text is
filtered out by noticing that any occurrence of the pattern must be
reflected in the presence of some text q-samples that match approximately
inside the pattern. We show experimentally that the parameterization mechanism o
f the related filtration scheme allows to increase the error level for which the
filtration is still efficient.