###
FIX to Compressed Full-Text Indexes

#### Level: Small

Not an error, but still making reader's life difficult, is
the vagueness about Sadakane's CSA space. Theorem 11 gives the space
*1/e n H_0 + O(n log log s)*, where *e* stands for epsilon, and is
linked to the time to locate, *O(log^e n)*, and to display. This is the
original space/time Sadakane reported in his papers.
Next we say that, by using
run-length compression on *Psi*, one can get *nH_k (log s + loglog n)
+ O(n)* bits, "while retaining the search times". These refer to counting
times. Where did the epsilon go? What happens to the locating and extraction
time? In fact, this epsilon does not apply anymore. To find the new locate
times, you have to consider the next paragraph, "In practice", where a sampling
mechanism is described. Yet, it is not well analyzed until it is used again for
the FM-index (where it was first introduced indeed), two paragraphs before
Theorem 14. There it is shown that, via sampling, one can get
*O(log^(1+e) n)* time and *O(n/log^e n)* extra bits of space.

In the "In practice" part of Theorem 13 we get back to Sadakane's CSA, and
say that a delta-encoding of the differences in *Psi* actually get
*nH_k+O(n loglog s)* bits of space, instead of
*nH_0+O(n loglog s)*. Note the missing *1/e* in the last
formula. It appears when one adds higher-order *Psi* arrays to implement
the inverse suffix array. Again, we do not use the inverse suffix array here
(as the higher-order *Psi*s are not compressible to *H_k* as far as
we know), but the sampling method.

Finally, it is a mistake to say that German is agglutinating (thanks to
Zsuzsanna Lipták for noting this). We referred to languages where many
particles are concatenated to form long words that would be phrases in other
languages, so cutting the words to find query terms becomes nontrivial.
Besides, the right term is "agglutinative".