Creating Textual Corpora Based on Wikipedia and Knowledge Graphs
2024; Springer International Publishing; Linguagem: Inglês
10.1007/978-3-031-60221-4_32
ISSN2367-3370
AutoresJanneth Chicaiza, Mateo Martínez-Velásquez, Fabian Soto-Coronel, Nadget Bouayad-Agha,
Tópico(s)Topic Modeling
ResumoInformation overload reduces the capability of machines to find relevant information. Furthermore, when dynamic topics or emerging events occur that arouse the interest of the community, unofficial or unreliable sources of information quickly emerge that, instead of satisfying the information needs of users, increase misinformation. To address this issue, this paper proposes a method to create domain-specific corpora of text that can offer immediate answers on a particular topic. The approach involves creating a vocabulary of the domain and then creating a textual corpus from Wikipedia pages related to the different terms of the domain. The authors tested this method by creating a specialized corpus for the pollution domain and implementing a process to answer queries about the domain. Preliminary results show that the Q &A system could provide accurate and up-to-date information on the topic, based on Wikipedia, a free-content platform that users continuously feed.
Referência(s)