Compression: A Key for Next-Generation Text Retrieval Systems

Nivio Ziviani, Edleno de Moura, Gonzalo Navarro and Ricardo Baeza-Yates.

Integrated text and index compression for rapid access to data poses a challenge to improve information retrieval systems by producing economical and flexible methods that retain textual compression, facilitate modifications, and enhance searchability.

As online textual information explodes through the widespread use of digital libraries, office automation systems, document databases, and the Web, the need arises for an effective information retrieval (IR) system. The Web alone comprises approximately 800 million static pages, containing 6 billion bytes of plain text enough to store the text in a million books. Because text retrieval is the kernel of most IR systems, today's IR systems face the dynamic challenge of providing rapid and immediate access to this textual mass.

In this article, we discuss the recent techniques that permit a fast and direct method for searching compressed text, and we explain how these new techniques can improve the overall efficiency of IR systems.