Transposition Invariant String Matching
Veli Mäkinen, Gonzalo Navarro and Esko Ukkonen
Given strings A and B over an alphabet S subset of U
where U is some numerical universe closed under addition and subtraction,
and a distance function d(A,B) that gives the score of the best (partial)
matching of A and B the transposition invariant distance is
min {d(A+t,B), t in U} where A+t = (a(1)+t)(a(2)+t)...(a(m)+t).
We study the problem of computing the transposition invariant distance for
various distance (and similarity) functions d, including Hamming
distance, longest common subsequence (LCS), Levenshtein
distance, and their versions where the exact matching condition is replaced
by an approximate one.
For all these problems we give algorithms whose time complexities are close
to the known upper bounds without transposition invariance, and for some we
achieve these upper bounds. In particular, we show how sparse dynamic
programming can be used to solve transposition invariant problems, and its
connection with multidimensional range-minimum search. As a byproduct, we give
improved sparse dynamic programming algorithms to compute LCS and Levenshtein
distance.