Document Listing on Repetitive Collections with Guaranteed Performance
Gonzalo Navarro
We consider document listing on string collections, that is, finding in which
strings a given pattern appears. In particular, we focus on repetitive
collections: a collection of size N over alphabet [1,a] is
composed of D copies of a string of size n, and s
single-character edits are applied on the copies. We introduce the first
document listing index with size O~(n + s), precisely
O((n lg a + s lg^2 N) lg D) bits, and with useful worst-case time
guarantees: Given a pattern of length m, the index reports the
ndoc strings where it appears in time O(m^2 + m lg N (lg D
+ lg^e N) ndoc), for any constant e > 0.