Integrating WordNet knowledge to supplement training data in semi-supervised agglomerative hierarchical clustering for text categorization
2001; Wiley; Volume: 16; Issue: 8 Linguagem: Inglês
10.1002/int.1042
ISSN1098-111X
AutoresMohammed Benkhalifa, A. Mouradi, Houssaine Bouyakhf,
Tópico(s)Topic Modeling
ResumoInternational Journal of Intelligent SystemsVolume 16, Issue 8 p. 929-947 Integrating WordNet knowledge to supplement training data in semi-supervised agglomerative hierarchical clustering for text categorization Mohammed Benkhalifa, Corresponding Author Mohammed Benkhalifa m.Benkhalifa@AlAkhawayn.ma School of Science and Engineering, Al Akhawayn University in Ifrane (AUI), P.O. Box 1828, Av. Hassan II, Ifrane 53000, MoroccoSchool of Science and Engineering, Al Akhawayn University in Ifrane (AUI), P.O. Box 1828, Av. Hassan II, Ifrane 53000, MoroccoSearch for more papers by this authorAbdelhak Mouradi, Abdelhak Mouradi mouradi@ensias.um5souissi.ac.ma Ecole Nationale Superieure d'Informatique et d'Analyses des Systémes (ENSIAS), Mohammed V University, P.O. Box 713, Agdal Rabat, MoroccoSearch for more papers by this authorHoussaine Bouyakhf, Houssaine Bouyakhf bouyakhf@fsr.ac.ma Computer Science Department, Mohammed V University, Facuty of Sciences in Rabat, MoroccoSearch for more papers by this author Mohammed Benkhalifa, Corresponding Author Mohammed Benkhalifa m.Benkhalifa@AlAkhawayn.ma School of Science and Engineering, Al Akhawayn University in Ifrane (AUI), P.O. Box 1828, Av. Hassan II, Ifrane 53000, MoroccoSchool of Science and Engineering, Al Akhawayn University in Ifrane (AUI), P.O. Box 1828, Av. Hassan II, Ifrane 53000, MoroccoSearch for more papers by this authorAbdelhak Mouradi, Abdelhak Mouradi mouradi@ensias.um5souissi.ac.ma Ecole Nationale Superieure d'Informatique et d'Analyses des Systémes (ENSIAS), Mohammed V University, P.O. Box 713, Agdal Rabat, MoroccoSearch for more papers by this authorHoussaine Bouyakhf, Houssaine Bouyakhf bouyakhf@fsr.ac.ma Computer Science Department, Mohammed V University, Facuty of Sciences in Rabat, MoroccoSearch for more papers by this author First published: 28 June 2001 https://doi.org/10.1002/int.1042Citations: 12AboutPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinked InRedditWechat Abstract The text categorization (TC) is the automated assignment of text documents to predefined categories based on document contents. TC has been an application for many learning approaches, which proved effective. Nevertheless, TC provides many challenges to machine learning. In this paper, we suggest, for text categorization, the integration of external WordNet lexical information to supplement training data for a semi-supervised clustering algorithm which (i) uses a finite design set of labeled data to (ii) help agglomerative hierarchical clustering algorithms (AHC) partition a finite set of unlabeled data and then (iii) terminates without the capacity to classify other objects. This algorithm is the "semi-supervised agglomerative hierarchical clustering algorithm" (ssAHC). Our experiments use Reuters 21578 database and consist of binary classifications for categories selected from the 89 TOPICS classes of the Reuters collection. Using the vector space model (VSM), each document is represented by its original feature vector augmented with external feature vector generated using WordNet. We verify experimentally that the integration of WordNet helps ssAHC improve its performance, effectively addresses the classification of documents into categories with few training documents, and does not interfere with the use of training data. © 2001 John Wiley & Sons, Inc. Citing Literature Volume16, Issue8August 2001Pages 929-947 RelatedInformation
Referência(s)