Space-Efficient Computation of the Burrows-Wheeler Transform
José Fuentes-Sepúlveda, Gonzalo Navarro and Yakov Nekrich
The Burrows-Wheeler Transform (BWT) has become an essential tool for
compressed text indexing. Computing it efficiently and within little space
is essential for the practicality of the indexes that build on it. A recent
algorithm (Munro, Navarro & Nekrich, SODA 2017) computes the BWT in
O(n) time using O(n log s) bits of space for a text of length
n over
an alphabet of size s. The result is of theoretical nature and its
practicality is far from obvious. In this paper we engineer their solution and
show that, while a basic implementation is slow in practice, the algorithm is
amenable to parallelization.
For a wide range of
alphabet sizes, our resulting implementation outperforms all the compact
constructions in the space/time tradeoff map.
On the smallest alphabets we are outperformed in time, but nevertheless
achieve the least space within reasonable time.
For example, in DNA sequences, the most widely used application of BWTs,
our construction uses 4.84 bits per base and builds
the BWT at a rate of 2.13 megabases per second, whereas the closest previous
alternative uses around 7.09 bits per base and runs at 4.17 megabases per
second.