Approximate String Matching with Lempel-Ziv Compressed Indexes
Luís Russo, Gonzalo Navarro, and Arlindo Oliveira
A compressed full-text self-index for a text T is a data structure
requiring reduced space and able of searching for patterns P in
T.
Furthermore, the structure can reproduce any substring of T, thus it
actually replaces T. Despite the explosion of interest on self-indexes in
recent years, there has not been much progress on search functionalities
beyond the basic exact search. In this paper we focus on indexed approximate
string matching (ASM), which is of great interest, say, in computational
biology applications. We present an ASM algorithm that works on top of a
Lempel-Ziv self-index. We consider the so-called hybrid indexes, which are
the best in practice for this problem. We show that a Lempel-Ziv index can be
seen as an extension of the classical q-samples index. We give new insights
on this type of index, which can be of independent interest, and then apply
them to the Lempel-Ziv index. We show experimentally that our algorithm has a
competitive performance and provides a useful space-time tradeoff compared to
classical indexes.