LZ78 Compression in Low Main Memory Space

Diego Arroyuelo, Rodrigo Canovas, Gonzalo Navarro, and Rajeev Raman

We present the first algorithms that perform the LZ78 compression of a text of length n over alphabet [1..s], whose output is z integers, using only O(z log s) bits of main memory. The algorithms read the input text from disk in a single pass, and write the compressed output to disk. The text can also be decompressed within the same main memory usage, which is unprecedented too. The algorithms are based on hashing and, under some simplifying assumptions, run in O(n) expected time. We experimentally verify that our algorithms use 2-9 times less time and/or space than previously implemented LZ78 compressors.