Approximate Regular Expression Searching with Arbitrary Integer Weights

Gonzalo Navarro

We present a bit-parallel technique to search a text of length n for a regular expression of m symbols permitting k differences in worst case time O(mn/log_k(s)), where s is the amount of main memory that can be allocated. The algorithm permits arbitrary integer weights and matches the complexity of the best previous techniques, but it is simpler and faster in practice. In our way, we define a new recurrence for approximate searching where the current values depend only on previous values. Interestingly, our algorithm turns out to be a relevant option also for simple approximate string matching with arbitrary integer weights.