Re-ranking and TOPSIS-based ensemble feature selection with multi-stage aggregation for text categorization
2023; Elsevier BV; Volume: 168; Linguagem: Inglês
10.1016/j.patrec.2023.02.027
ISSN1872-7344
AutoresGuanghua Fu, Bencheng Li, Yongsheng Yang, Chaofeng Li,
Tópico(s)Advanced Clustering Algorithms Research
ResumoAiming at reducing data dimensionality, feature selection (FS) could improve the accuracy and reduce computational cost of machine learning model, especially those with high-dimensional text datasets. To improve the robustness, ensemble feature selection (EFS) has been developed with considerable attention recently where different aggregation methods are applied. This paper proposed a four-stage EFS method called re-ranking and TOPSIS-based ensemble feature selection (RTEFS). In the first stage of RTEFS, features are extracted from the text corpus. The second one is to construct a union subset yielded by six filter-based FS methods out of preprocessing feature vectors. Then a re-ranking stage is applied to evaluate those features from such subset. The TOPSIS method is used to aggregate the ranking lists ranked by two FS groups in the ensemble feature ranking stage. In the final stage, the two fused rankings are ensembled via a multi-objective genetic algorithm NSGA-III. To demonstrate the superiority of the proposed method, experiments are performed using 20-Newsgroups and Reuters-21,578 datasets with Support Vector Machine and K-Nearest Neighbors classifiers. Results show that RTEFS produces higher accuracy and F-measure scores over the base counterparts.
Referência(s)