Matchsimile: A flexible approximate matching tool for searching proper names

2002; Wiley; Volume: 54; Issue: 1 Linguagem: Inglês

10.1002/asi.10178

ISSN

1532-2890

Autores

Gonzalo Navarro, Ricardo Baeza‐Yates, João Marcelo Azevedo Arcoverde,

Tópico(s)

Topic Modeling

Resumo

Abstract We present the architecture and algorithms behind Matchsimile , an approximate string matching lookup tool especially designed for extracting person and company names from large texts. Part of a larger information extraction environment, this specific engine receives a large set of proper names to search for, a text to search, and search options; and outputs all the occurrences of the names found in the text. Beyond the similarity search capabilities applied at the intraword level, the tool considers a set of specific person name formation rules at the word level, such as combination, abbreviation, duplicity detections, ordering, word omission and insertion, among others. This engine is used in a successful commercial application (also named Matchsimile ), which allows searching for lawyer names in official law publications.

Referência(s)