L-systems for Measuring Repetitiveness
Gonzalo Navarro and Cristian Urbina
An L-system (for compression) is a deterministic context-free L-system
(without epsilon-rules) extended with two parameters d and
n, and also a coding t, which determines unambiguously a string
w = t(phi^d(s))[1:n], where phi is the morphism of the
system, and s is its axiom. The length of the shortest description of an
L-system generating w is known as ell, and it is arguably a relevant measure of repetitiveness that builds on the self-similarities that arise in the sequence.
In this paper we deepen the study of the measure ell and its
relation with delta, a better established lower bound that builds on
substring complexity. Our results show that ell and delta are largely orthogonal, in the sense that one can be much larger than the other depending on the case. This suggests that both mechanisms capture different kinds of regularities related to repetitiveness.
Then, we show that the recently introduced NU-systems, which combine the
capabilities of L-systems with bidirectional macro-schemes, can be
asymptotically strictly smaller than both mechanisms for the same fixed string
family, which makes the size nu of the smallest NU-system the unique smallest reachable repetitiveness measure to date. We conclude that in order to achieve better compression, we should combine morphism substitution with copy-paste mechanisms.