(S,C)-Dense Coding: An Optimized Compression Code for Natural
Language Text Databases.
Nieves Brisaboa, Antonio Fariña, Gonzalo Navarro and María Esteller
This work presents (s,c)-Dense Code, a new method for compressing
natural language texts. This technique is a generalization of a
previous compression technique called End-Tagged Dense Code that
obtains better compression ratio as well as a simpler and faster
encoding than Tagged Huffman. At the same time, (s,c)-Dense
Code is a prefix code that maintains the most interesting features
of Tagged Huffman Code with respect to direct search on the
compressed text. (s,c)-Dense Coding retains all the efficiency and
simplicity of Tagged Huffman, and improves its compression ratios.
We formally describe the (s,c)-Dense Code and show how to
compute the parameters s and c that optimize the compression for
a specific corpus.
Our empirical results show that (s,c)-Dense Code improves
End-Tagged Dense Code and Tagged Huffman Code, and reaches only 0.5%
overhead over plain Huffman Code.