Distributed Generation of Suffix Arrays
Gonzalo Navarro, João Paulo Kitajima, Berthier Ribeiro and Nivio Ziviani
An algorithm for the distributed computation of suffix arrays for large texts
is presented. The parallelism model is that of a set of
sequential tasks which execute in parallel and exchange messages among
them. The underlying architecture is that of a high bandwidth network
of processors. Our algorithm builds the suffix array by quickly assigning
an independent subproblem to each processor and completing the process
with a final local sorting. We
demonstrate that the algorithm has time complexity of O (b log n)
computation and O (b) communication in the average case, where b
corresponds to the local text size on
each processor (i.e., text size n divided by r, the number of
processors). This is faster than the best known sequential algorithm and
improves over previous parallel algorithms to build suffix arrays, both
in time complexity and scaling factor.