Approximate Searching on Compressed Text

Carlos Avendaño, Claudia Feregrino and Gonzalo Navarro.

The approximate searching problem on compressed text tries to find all the matches of a pattern in a compressed text, without decompressing it and considering that the match of the pattern with the text can have a limited number of differences. This problem has diverse applications in information retrieval, computational biology and signal processing, among others. One of the best solutions to this problem is to execute a multipattern search of a set of pieces of the pattern, followed by a local decompression and a direct verification in the decompressed areas. In this work an improvement to this solution concerning verification is presented, where instead of executing a decompression process and searching for the pattern, bit-parallel automata are constructed that recognize the pattern. In this way, we perform the entire searching process without decompressing the text and obtain competitive times, compared to decompressing text and searching it with the best existing algorithms.