Improved Antidictionary Based Compression
Maxime Crochemore and Gonzalo Navarro
The compression of binary texts using antidictionaries is a novel technique
based on the fact that some substrings (called "antifactors") never appear in
the text. Let sb be an antifactor, where b is its last bit.
Every time s appears in the text we know that the next bit is not b
and hence omit its representation. Since building the set of all antifactors
is space consuming at compression time, it is customary to limit the
maximum length of antifactors considered up to a constant k. Larger k yields
better compression of the text but requires more space at compression time.
In this paper we introduce the notion of almost antifactors, which are
strings that rarely appear in the text. More formally, almost antifactors are
strings that, if we consider them as antifactors and separately code their
occurrences as exceptions, the compression ratio improves. We show that almost
antifactors permit improving compression with a limited amount of main memory
to compress. Our experiments show that they obtain the same compression of
the classical algorithm using only 30%-55% of its memory space.