Distributed Generation of Suffix Arrays

Gonzalo Navarro, João Paulo Kitajima, Berthier Ribeiro and Nivio Ziviani

An algorithm for the distributed computation of suffix arrays for large texts is presented. The parallelism model is that of a set of sequential tasks which execute in parallel and exchange messages among them. The underlying architecture is that of a high bandwidth network of processors. Our algorithm builds the suffix array by quickly assigning an independent subproblem to each processor and completing the process with a final local sorting. We demonstrate that the algorithm has time complexity of O (b log n) computation and O (b) communication in the average case, where b corresponds to the local text size on each processor (i.e., text size n divided by r, the number of processors). This is faster than the best known sequential algorithm and improves over previous parallel algorithms to build suffix arrays, both in time complexity and scaling factor.