###
On the Approximation Ratio of Lempel-Ziv Parsing

####
Travis Gagie, Gonzalo Navarro and Nicola Prezza

Shannon's entropy is a clear lower bound for statistical compression. The
situation is not so well understood for dictionary-based compression. A
plausible lower bound is *b*, the least number of phrases
of a general bidirectional parse of a text, where phrases can be copied
from anywhere else in the text. Since computing *b* is NP-complete, a
popular gold standard is *z*, the number of phrases in the Lempel-Ziv
parse of the text, where phrases can be copied only from the left. While
*z* can be computed in linear time, almost nothing has been known for
decades about its approximation ratio with respect to *b*. In this paper
we prove that *z = O(b log(n/b))*, where *n* is the text length.
We also show that the bound is tight as a function of *n*, by exhibiting
a string family where *z = Omega(b log n)*. Our upper bound is obtained
by building a run-length context-free grammar based on a locally consistent
parsing of the text. Our lower bound is obtained by relating *b* with
*r*, the number of
equal-letter runs in the Burrows-Wheeler transform of the text. On our
way, we prove other relevant bounds between compressibility measures.