Exploiting named entities for bilingual news clustering
2014; Wiley; Volume: 66; Issue: 2 Linguagem: Inglês
10.1002/asi.23175
ISSN2330-1643
AutoresSoto Montalvo, Raquel Martínez, Víctor Fresno, Agustín Delgado,
Tópico(s)Text and Document Classification Technologies
ResumoIn this article, we present a new algorithm for clustering a bilingual collection of comparable news items in groups of specific topics. Our hypothesis is that named entities ( NE s) are more informative than other features in the news when clustering fine grained topics. The algorithm does not need as input any information related to the number of clusters, and carries out the clustering only based on information regarding the shared named entities of the news items. This proposal is evaluated using different data sets and outperforms other state‐of‐the‐art algorithms, thereby proving the plausibility of the approach. In addition, because the applicability of our approach depends on the possibility of identifying equivalent named entities among the news, we propose a heuristic system to identify equivalent named entities in the same and different languages, thereby obtaining good performance.
Referência(s)