Improvements and Extra Material
Each comment is preceded with the page number.
- 86
A new technique by Katoh and Goto [ref] makes Section 3.4.1
probably obsolete, as they managed to store initializable arrays using only
1 extra bit in total.
- 99
In Section 4.7, there are improved ways to represent inverted indexes using
very sparse bitvectors [ref].
- 162
In Section 6.6, in the bibliographic discussion on wavelet trees, we can
now add that it is possible to represent the Huffman-like model of wavelet
matrices in O(sigma log log n) bits, while decoding and decoding in
time proportional to the codeword length in bits [ref].
- 226
In "Remaining Operations" we could mention that, in order to compute height,
it is not necessary to compute deepestnode. Just as in page 183 we computed
minExcess as a previous step to compute rmq, we could analogously
compute maxExcess, which suffices to find the height of a node without
the extra work needed to compute rMq.
- 240
An interesting note on Section 8.3.1 is that, if we represent the bitvector
S using the technique for very sparse bitvectors (Section 4.4, with
components H and L), then H corresponds exactly to the
DFUDS sequence (which therefore does not need to be stored separately!).
Further, H and L correspond exactly to the bitvector B
and the plain representation of sequence S of the next Section 8.4.
That is, the idea of Section 8.3.1, removing the redundant DFUDS sequence
and representing the remaining bitvector as a very sparse one, is exactly
the same simple representation for labeled trees described in Section 8.4.
The spaces are also identical,
n log σ + 2n + o(n) bits.
- 269
In Section 8.5.7, Algorithm 8.22, line 12, we could use instead
fwdsearch(T,v-1,k+1), which simplifies things: line 10 disappears and
line 9 just says list(v,k).
- 274
In Section 8.7, Grammar compression, it turns out that the technique of
González et al. (2014) we mention offers the same asymptotic
compression than the one of Tabei et al. (2013) if we first convert C
into rules of R. This alternative, in addition, is likely to perform
better in practice, as it uses one rank instead of select per rule recursively
extracted. A similar representation is presented more clearly in a more recent
article on online grammar compression [ref].
Another point is that there are newer results on the smallest grammar and
the competitiveness of RePair [ref].
- 295
There is, indeed, an analysis that shows that the space is reduced on
clustered graphs. See Theorem 1 here.
- 304
In Section 9.3, Algorithm 9.9, neigh can be enhanced so that we
extract the jth neighbor efficiently, using a technique similar to
Algorithm 6.16 for range quantile queries.
- 307
In Section 9.4, there are simpler ways to build a planar graph that do not
change the embedding and still efficiently support most of the operations
[ref].
- 432
A very interesting extension of FM-indexes, which would fit well here, is the
representation of de Bruijn graphs for sequence assembly
[ref]
[ref]
[ref]
- 497
In Section 12.9, there are recent dynamic trie implementations based on
hashing that perform well in practice
[ref]
[ref].
Further, it might be more efficient to implement dynamic k^2-trees
using dynamic tries, instead of building on a dynamic LOUDS representation.
Finally, Tsur has recent improved some further dynamic tree operations to run
in time O(log n / log log n)
[ref].
- 504
In Section 13.1.4, there are new results about nearest smaller values
[ref]
and nearest larger values [ref].
- 507
In Section 13.1.4, there is a new tau-majority encoding that achieves the same
asymptotic space of the one we describe in the book, but has optimal query
time: O(1/tau) [ref].
They also show that this time is optimal in a more general sense than
previously known, and that obtaining output-sensitive time is unlikely.
- 508
In Section 13.2, there is new research about indexing repetitive data that
is not just text or trees. For example, graphs [ref] or grids
[ref].
- 511
In Section 13.2.1, there is a new LZ76 parsing algorithm that obtains
O(n) time and O(n log sigma) bits of space [ref].
Also, the algorithms by Kärkkäinen, Kempa, and Puglisi appear now
in a journal [ref].
- 516
At the end of Section 13.2.2, we mention that we could use the simpler LZ76
parsing that allows overlaps between sources and targets.
Here we describe a simple variant of such a parsing.
- 520
In footnote 7, the thesis of Nicola Prezza shows a string family where z =
Theta (r log n) (Fibonacci strings) and another where r =
Theta (z log n) (de Bruijn sequences), where z refers to the
original LZ76 parsing. The example given in the book is not sufficient to
conclude that r may be Theta (z sqrt(n)).
- 521
The main problem of run-length compressed suffix arrays, that is, its
sampling, was finally solved in 2018
[ref].
- 521
At the end of subsection "Sampling", there is an improved version of the
suffix array of an alignment
[ref].
Something to add? Please let us know
and enter our Hall of Fame!