Practical Algorithms for Transposition-Invariant String-Matching
Kjell Lemström, Gonzalo Navarro and Yoan Pinzon
We consider the problems of (1) longest common subsequence (LCS)
of two given strings in the case where the first may be shifted by
some constant (that is, transposed) to match the second, and (2)
transposition-invariant text searching using indel distance. These
problems have applications in music comparison and retrieval. We
introduce two novel techniques to solve these problems
efficiently. The first is based on the branch and bound method,
the second on bit-parallelism. Our branch and bound algorithm
computes the longest common transposition-invariant subsequence
(LCTS) in time O((m^2 + log log sigma) log sigma) in the
best case and O((m^2 + log sigma) sigma) in the worst case,
where m and sigma, respectively, are the length of the
strings and the size of the alphabet. On the other hand, we show
that the same problem can be solved by using bit-parallelism and
thus obtain a speedup of O(w/log m) over the classical
algorithms, where the computer word has w bits. The advantage of
this latter algorithm over the present bit-parallel ones is that
it allows the use of more complex distances, including general
integer weights. Since our branch and bound method is very
flexible, it can be further improved by combining it with other
efficient algorithms such as our novel bit-parallel algorithm. We
experiment on several combination possibilities and discuss which
are the best settings for each of those combinations. Our
algorithms are easily extended to other musically relevant cases,
such as delta-matching and polyphony (where there are several
parallel texts to be considered). We also show how our
bit-parallel algorithm is adapted to text searching and illustrate
its effectiveness in complex cases where the only known competing
method is the use of brute force.