Artigo Revisado por pares

Feature selection for text classification with Naïve Bayes

2008; Elsevier BV; Volume: 36; Issue: 3 Linguagem: Inglês

10.1016/j.eswa.2008.06.054

ISSN

1873-6793

Autores

Jingnian Chen, Houkuan Huang, Shengfeng Tian, Youli Qu,

Tópico(s)

Advanced Text Analysis Techniques

Resumo

As an important preprocessing technology in text classification, feature selection can improve the scalability, efficiency and accuracy of a text classifier. In general, a good feature selection method should consider domain and algorithm characteristics. As the Naïve Bayesian classifier is very simple and efficient and highly sensitive to feature selection, so the research of feature selection specially for it is significant. This paper presents two feature evaluation metrics for the Naïve Bayesian classifier applied on multi-class text datasets: Multi-class Odds Ratio (MOR), and Class Discriminating Measure (CDM). Experiments of text classification with Naïve Bayesian classifiers were carried out on two multi-class texts collections. As the results indicate, CDM and MOR gain obviously better selecting effect than other feature selection approaches.

Referência(s)
Altmetric
PlumX