Optimized Binary Search and Text Retrieval
Eduardo Barbosa, Gonzalo Navarro, Ricardo Baeza-Yates, Chris Perleberg and Nivio Ziviani
We present an algorithm that minimizes the expected cost of indirect binary
search for data with non-constant access costs, such as disk data.
Indirect binary search means that sorted access to the data is obtained through
an array of pointers to the raw data. One immediate application of this
algorithm is to improve the retrieval performance of disk databases that
are indexed using the suffix array model (also called PAT array).
We consider the cost model of magnetic and optical disks
and the anticipated knowledge of the expected size of the subproblem
produced by reading each disk track.
This information is used to devise a modified binary searching
algorithm to decrease overall retrieval costs.
Both an optimal and a practical algorithm are presented, together with
analytical and experimental results.
For 100 megabytes of text the practical algorithm costs 60% of the standard
binary search cost for the magnetic disk and 65% for the optical disk.