###
Optimal-Time Dictionary-Compressed Indexes

####
Anders Roy Christiansen, Mikko Berggren Ettienne, Tomasz Kociumaka,
Gonzalo Navarro, and Nicola Prezza

We describe the first self-indexes able to count and locate pattern
occurrences in optimal time within a space bounded by the size of the most
popular dictionary compressors. To achieve this result we combine several
recent findings, including *string attractors* --- new combinatorial
objects encompassing most known compressibility measures for highly repetitive
texts ---, and grammars based on *locally-consistent parsing*.
More in detail, let *γ* be the size of the smallest attractor for a
text *T* of length *n*. The measure *γ* is an (asymptotic) lower
bound to the size of dictionary compressors based on Lempel--Ziv, context-free
grammars, and many others. The smallest known text representations in terms of
attractors use space *O(γ log(n/γ))*, and
our lightest indexes work within the same asymptotic space. Let
*e>0* be a suitably small constant fixed at construction time,
*m* be the pattern length, and *occ* be the number of its text
occurrences. Our index counts pattern occurrences in
*O(m + log^(2+e) n)* time, and locates them in
*O(m + (occ+1) log^e n)* time.
These times already outperform those of most dictionary-compressed indexes,
while obtaining the least asymptotic space for any index searching within
*O((m+occ) polylog n)* time.
Further, by increasing the space to *O(γ log(n/γ) log^e
n)*, we reduce the
locating time to the optimal *O(m+occ)*, and within
*O(γ log(n/γ) log n)*
space we can also count in optimal *O(m)* time. No dictionary-compressed index had obtained this time before. All our indexes can be
constructed in *O(n)* space and *O(n log n)* expected time.
As a byproduct of independent interest, we show how to build, in *O(n)*
expected time and without knowing the size *γ* of the smallest
attractor (which is NP-hard to find), a run-length context-free grammar of
size *O(γ log(n/γ))* generating (only) *T*.
As a result, our indexes can be built without knowing *γ*.