Improvements and Extra Material

Each comment is preceded with the page number.

86 A new technique by Katoh and Goto [ref] makes Section 3.4.1 probably obsolete, as they managed to store initializable arrays using only 1 extra bit in total.
99 In Section 4.7, there are improved ways to represent inverted indexes using very sparse bitvectors [ref].
162 In Section 6.6, in the bibliographic discussion on wavelet trees, we can now add that it is possible to represent the Huffman-like model of wavelet matrices in O(sigma log log n) bits, while decoding and decoding in time proportional to the codeword length in bits [ref].
226 In "Remaining Operations" we could mention that, in order to compute height, it is not necessary to compute deepestnode. Just as in page 183 we computed minExcess as a previous step to compute rmq, we could analogously compute maxExcess, which suffices to find the height of a node without the extra work needed to compute rMq.
240 An interesting note on Section 8.3.1 is that, if we represent the bitvector S using the technique for very sparse bitvectors (Section 4.4, with components H and L), then H corresponds exactly to the DFUDS sequence (which therefore does not need to be stored separately!). Further, H and L correspond exactly to the bitvector B and the plain representation of sequence S of the next Section 8.4. That is, the idea of Section 8.3.1, removing the redundant DFUDS sequence and representing the remaining bitvector as a very sparse one, is exactly the same simple representation for labeled trees described in Section 8.4. The spaces are also identical, n log σ + 2n + o(n) bits.
269 In Section 8.5.7, Algorithm 8.22, line 12, we could use instead fwdsearch(T,v-1,k+1), which simplifies things: line 10 disappears and line 9 just says list(v,k).
274 In Section 8.7, Grammar compression, it turns out that the technique of González et al. (2014) we mention offers the same asymptotic compression than the one of Tabei et al. (2013) if we first convert C into rules of R. This alternative, in addition, is likely to perform better in practice, as it uses one rank instead of select per rule recursively extracted. A similar representation is presented more clearly in a more recent article on online grammar compression [ref]. Another point is that there are newer results on the smallest grammar and the competitiveness of RePair [ref].
295 There is, indeed, an analysis that shows that the space is reduced on clustered graphs. See Theorem 1 here.
304 In Section 9.3, Algorithm 9.9, neigh can be enhanced so that we extract the jth neighbor efficiently, using a technique similar to Algorithm 6.16 for range quantile queries.
307 In Section 9.4, there are simpler ways to build a planar graph that do not change the embedding and still efficiently support most of the operations [ref].
432 A very interesting extension of FM-indexes, which would fit well here, is the representation of de Bruijn graphs for sequence assembly [ref] [ref] [ref]
497 In Section 12.9, there are recent dynamic trie implementations based on hashing that perform well in practice [ref] [ref]. Further, it might be more efficient to implement dynamic k^2-trees using dynamic tries, instead of building on a dynamic LOUDS representation. Finally, Tsur has recent improved some further dynamic tree operations to run in time O(log n / log log n) [ref].
504 In Section 13.1.4, there are new results about nearest smaller values [ref] and nearest larger values [ref].
507 In Section 13.1.4, there is a new tau-majority encoding that achieves the same asymptotic space of the one we describe in the book, but has optimal query time: O(1/tau) [ref]. They also show that this time is optimal in a more general sense than previously known, and that obtaining output-sensitive time is unlikely.
508 In Section 13.2, there is new research about indexing repetitive data that is not just text or trees. For example, graphs [ref] or grids [ref].
511 In Section 13.2.1, there is a new LZ76 parsing algorithm that obtains O(n) time and O(n log sigma) bits of space [ref]. Also, the algorithms by Kärkkäinen, Kempa, and Puglisi appear now in a journal [ref].
516 At the end of Section 13.2.2, we mention that we could use the simpler LZ76 parsing that allows overlaps between sources and targets. Here we describe a simple variant of such a parsing.
520 In footnote 7, the thesis of Nicola Prezza shows a string family where z = Theta (r log n) (Fibonacci strings) and another where r = Theta (z log n) (de Bruijn sequences), where z refers to the original LZ76 parsing. The example given in the book is not sufficient to conclude that r may be Theta (z sqrt(n)).
521 The main problem of run-length compressed suffix arrays, that is, its sampling, was finally solved in 2018 [ref].
521 At the end of subsection "Sampling", there is an improved version of the suffix array of an alignment [ref].

Something to add? Please let us know and enter our Hall of Fame!