Efficiently Decodable and Searchable Natural Language Adaptive Compression
Nieves Brisaboa, Antonio Fariña, Gonzalo Navarro, and José Paramá
We address the problem of adaptive compression of natural
language text, focusing on the case where low bandwidth is
available and the receiver has little processing power, as in
mobile applications. Our technique achieves compression ratios
around 32% and requires very little effort from the receiver. This
tradeoff, not previously achieved with alternative techniques,
is obtained by breaking the usual symmetry between sender and
receiver dominant in statistical adaptive compression. Moreover,
we show that our technique can be adapted to avoid decompression
at all in cases where the receiver only wants to detect the presence
of some keywords in the document. This is useful in scenarios such as
selective dissemination of information, news clipping, alert
systems, text categorization, and clustering. Thanks to the asymmetry
we introduce, the receiver can search the compressed text much
faster than the plain text. This was previously achieved only in
semistatic compression scenarios.