###
RePair and All Irreducible Grammars are Upper Bounded by High-Order
Empirical Entropy

####
Carlos Ochoa and Gonzalo Navarro

Irreducible grammars are a class of context-free grammars with well-known
representatives, such as RePair (with a few tweaks), Longest
Match, Greedy, and Sequential. We show that a grammar-based
compression method described by Kieffer and Yang (2000) is upper bounded by
the high-order empirical entropy of the string when the
underlying grammar is irreducible. Specifically, given a string *S* over an
alphabet of size *s*, we prove that if the underlying grammar is
irreducible, then the length of the binary code output by this grammar-based
compression method is bounded by *|S| H_k(S) + o(|S| log s)* for any
*k* in *o(log_s |S|)*, where *H_k(S)*
is the *k*-order empirical entropy of *S*. This is the first bound
encompassing the whole class of irreducible grammars in terms of the
high-order empirical entropy, with coefficient 1.