Re-Pair Achieves High-Order Entropy
Gonzalo Navarro and Luís Russo
Re-Pair is a dictionary-based compression method invented in 1999 by Larsson
and Moffat. Although its practical performance has been established through
experiments, the method has resisted all attempts of formal analysis. In this
paper we show that Re-Pair compresses a sequence T[1,n] over an alphabet
of size s and k-th order entropy Hk, to at most
2 n Hk + o(n log s) bits, for any k = o(log_s n).