Re-Pair Achieves High-Order Entropy

Gonzalo Navarro and Luís Russo

Re-Pair is a dictionary-based compression method invented in 1999 by Larsson and Moffat. Although its practical performance has been established through experiments, the method has resisted all attempts of formal analysis. In this paper we show that Re-Pair compresses a sequence T[1,n] over an alphabet of size s and k-th order entropy Hk, to at most 2 n Hk + o(n log s) bits, for any k = o(log_s n).