Capítulo de livro

Creating Textual Corpora Based on Wikipedia and Knowledge Graphs

2024; Springer International Publishing; Linguagem: Inglês

10.1007/978-3-031-60221-4_32

ISSN

2367-3370

Autores

Janneth Chicaiza, Mateo Martínez-Velásquez, Fabian Soto-Coronel, Nadget Bouayad-Agha,

Tópico(s)

Topic Modeling

Resumo

Information overload reduces the capability of machines to find relevant information. Furthermore, when dynamic topics or emerging events occur that arouse the interest of the community, unofficial or unreliable sources of information quickly emerge that, instead of satisfying the information needs of users, increase misinformation. To address this issue, this paper proposes a method to create domain-specific corpora of text that can offer immediate answers on a particular topic. The approach involves creating a vocabulary of the domain and then creating a textual corpus from Wikipedia pages related to the different terms of the domain. The authors tested this method by creating a specialized corpus for the pollution domain and implementing a process to answer queries about the domain. Preliminary results show that the Q &A system could provide accurate and up-to-date information on the topic, based on Wikipedia, a free-content platform that users continuously feed.

Referência(s)