Matchsimile: A Flexible Approximate Matching Tool for Personal Names
Searching
Gonzalo Navarro, Ricardo Baeza-Yates and Joćo Marcelo Arcoverde
In this paper we present the architecture and algorithms behind Matchsimile,
an approximate string matching lookup tool
especially designed for human and company names searches
against a large textual database. Part of a larger information retrieval
environment, this specific engine accepts an input text file with a set of
personal and company names and a set of restrictions for the search.
After a batch processing, the engine outputs another text file
containing the occurrences that match each record of the input names
file, according to its search parameters. Beyond the similarity
search capabilities applied on each word that forms a name, the tool
considers a set of personal names formation rules for their words such as
combination, abbreviation, character mapping, duplicity detections,
ordering, word omission and insertion, among others. This engine is used
in a succeeded commercial application (also named Matchsimile), which
uses this tool to allow lawyers names searches against many official law
journals publications.