###
Approximate Matching of Run-Length Compressed Strings

####
Veli Mäkinen, Gonzalo Navarro and Esko Ukkonen

We focus on the problem of approximate matching of strings that have been
compressed using run-length encoding. Previous studies have concentrated
on the problem of computing the longest common subsequence (LCS) between
two strings of length *m* and *n*, compressed to *m'* and *n'* runs.
We extend an existing algorithm for the LCS to the Levenshtein distance
achieving *O(m'n+n'm)* complexity. Furthermore, we extend this algorithm
to a weighted edit distance model, where the weights of the three basic
edit operations can be chosen arbitrarily. This approach gives
also an algorithm for approximate searching of a pattern of *m* letters (*m'*
runs) in a text of *n* letters (*n'* runs) in *O(mm'n')* time. Then we propose
improvements for a greedy algorithm for the LCS, and conjecture that the
improved algorithm has *O(m'n')* expected case complexity. Experimental
results are provided to support the conjecture.