Feature Selection-Ranking Methods in a Very Large Electric Database
2004; Springer Science+Business Media; Linguagem: Inglês
10.1007/978-3-540-24694-7_30
ISSN1611-3349
AutoresManuel Mejía-Lavalle, Guillermo Rodríguez-Ortiz, Gustavo Figueroa, Eduardo F. Morales,
Tópico(s)Fuzzy Logic and Control Systems
ResumoFeature selection is a crucial activity when knowledge discovery is applied to very large databases, as it reduces dimensionality and therefore the complexity of the problem. Its main objective is to eliminate attributes to obtain a computationally tractable problem, without affecting the quality of the solution. To perform feature selection, several methods have been proposed, some of them tested over small academic datasets. In this paper we evaluate different feature selection-ranking methods over a very large real world database related with a Mexican electric energy client-invoice system. Most of the research on feature selection methods only evaluates accuracy and processing time; here we also report on the amount of discovered knowledge and stress the issue around the boundary that separates relevant and irrelevant features. The evaluation was done using Elvira and Weka tools, which integrate and implement state of the art data mining algorithms. Finally, we propose a promising feature selection heuristic based on the experiments performed.
Referência(s)