Artigo Acesso aberto

Performance comparison of TF-IDF and Word2Vec models for emotion text classification

2021; Institute of Advanced Engineering and Science (IAES); Volume: 10; Issue: 5 Linguagem: Inglês

10.11591/eei.v10i5.3157

ISSN

2302-9285

Autores

Denis Eka Cahyani, Irene Patasik,

Tópico(s)

Edcuational Technology Systems

Resumo

Emotion is the human feeling when communicating with other humans or reaction to everyday events. Emotion classification is needed to recognize human emotions from text. This study compare the performance of the TF-IDF and Word2Vec models to represent features in the emotional text classification. We use the support vector machine (SVM) and Multinomial Naïve Bayes (MNB) methods for classification of emotional text on commuter line and transjakarta tweet data. The emotion classification in this study has two steps. The first step classifies data that contain emotion or no emotion. The second step classifies data that contain emotions into five types of emotions i.e. happy, angry, sad, scared, and surprised. This study used three scenarios, namely SVM with TF-IDF, SVM with Word2Vec, and MNB with TF-IDF. The SVM with TF-IDF method generate the highest accuracy compared to other methods in the first dan second steps classification, then followed by the MNB with TF-IDF, and the last is SVM with Word2Vec. Then, the evaluation using precision, recall, and F1-measure results that the SVM with TF-IDF provides the best overall method. This study shows TF-IDF modeling has better performance than Word2Vec modeling and this study improves classification performance results compared to previous studies.

Referência(s)