Matchsimile: A flexible approximate matching tool for searching proper names
2002; Wiley; Volume: 54; Issue: 1 Linguagem: Inglês
10.1002/asi.10178
ISSN1532-2890
AutoresGonzalo Navarro, Ricardo Baeza‐Yates, João Marcelo Azevedo Arcoverde,
Tópico(s)Topic Modeling
ResumoAbstract We present the architecture and algorithms behind Matchsimile , an approximate string matching lookup tool especially designed for extracting person and company names from large texts. Part of a larger information extraction environment, this specific engine receives a large set of proper names to search for, a text to search, and search options; and outputs all the occurrences of the names found in the text. Beyond the similarity search capabilities applied at the intraword level, the tool considers a set of specific person name formation rules at the word level, such as combination, abbreviation, duplicity detections, ordering, word omission and insertion, among others. This engine is used in a successful commercial application (also named Matchsimile ), which allows searching for lawyer names in official law publications.
Referência(s)