Wheeler Maps

Andrej Baláž, Travis Gagie, Adrián Goga, Simon Heumos, Gonzalo Navarro, Alessia Petescia, and Jouni Sirén

Motivated by challenges in pangenomic read alignment, we propose a generalization of Wheeler graphs that we call Wheeler maps. A Wheeler map stores a text T[1..n] and an assignment of tags to the characters of T such that we can preprocess a pattern P[1..m] and then, given i and j, quickly return all the distinct tags labeling the first characters of the occurrences of P[i..j] in T. For the applications that most interest us, characters with long common contexts are likely to have the same tag, so we consider the number t of runs in the list of tags sorted by their characters' positions in the Burrows-Wheeler Transform (BWT) of T. We show how, given a straight-line program with g rules for T, we can build an O(g + r + t)-space Wheeler map, where r is the number of runs in the BWT of T, with which we can preprocess a pattern P[1..m] in O(m log n) time and then return the k distinct tags for P[i..j] in optimal O(k) time for any given i and j.