Wheeler Maps
Andrej Baláž, Travis Gagie, Adrián Goga, Simon Heumos,
Gonzalo Navarro, Alessia Petescia, and Jouni Sirén
Motivated by challenges in pangenomic read alignment, we propose a
generalization of Wheeler graphs that we call Wheeler maps. A Wheeler map
stores a text T[1..n] and an assignment of tags to the characters of
T such that we can preprocess a pattern P[1..m] and then, given
i and j, quickly return all the distinct tags labeling the first
characters of the occurrences of P[i..j] in T. For the applications
that most interest us, characters with long common contexts are likely to have
the same tag, so we consider the number t of runs in the list of tags
sorted by their characters' positions in the Burrows-Wheeler Transform (BWT)
of T. We show how, given a straight-line program with g rules for
T, we can build an O(g + r + t)-space Wheeler map, where
r is
the number of runs in the BWT of T, with which we can preprocess a
pattern P[1..m] in O(m log n) time and then return the
k
distinct tags for P[i..j] in optimal O(k) time for any given
i
and j.