Approximating Optimal Bidirectional Macro Schemes

Luís Russo, Ana Correia, Gonzalo Navarro, and Alexandre Francisco

Lempel-Ziv is an easy-to-compute member of a wide family of so-called macro schemes; it restricts pointers to go in one direction only. Optimal bidirectional macro schemes are NP-complete to find, but they may provide much better compression on highly repetitive sequences. We consider the problem of approximating optimal bidirectional macro schemes. We describe a simulated annealing algorithm that usually, in a few iterations, converges to a 2-approximation of the optimal bidirectional macro scheme. We test our algorithm on a number of real and artificial repetitive texts and verify that it is efficient in practice and outperforms Lempel-Ziv, sometimes by a wide margin.