Artigo Acesso aberto Revisado por pares

Sentiment Classification of Movie Reviews using Levenshtein Distance

2013; Digital Contents Society; Volume: 14; Issue: 4 Linguagem: Inglês

10.9728/dcs.2013.14.4.581

ISSN

2287-738X

Autores

Kwang-Mo Ahn, Yun-Suk Kim, Young-Hoon Kim, Young‐Hoon Seo,

Tópico(s)

Text and Document Classification Technologies

Resumo

In this paper, we propose a method of sentiment classification which uses Levenshtein distance. We generate BOW(Bag-Of-Word) applying Levenshtein daistance in sentiment features and used it as the training set. Then the machine learning algorithms we used were SVMs(Support Vector Machines) and NB(Naive Bayes). As the data set, we gather 2,385 reviews of movies from an online movie community (Daum movie service). From the collected reviews, we pick sentiment words up manually and sorted 778 words. In the experiment, we perform the machine learning using previously generated BOW which was applied Levenshtein distance in sentiment words and then we evaluate the performance of classifier by a method, 10-fold-cross validation. As the result of evaluation, we got 85.46% using Multinomial Naive Bayes as the accuracy when the Levenshtein distance was 3. According to the result of the experiment, we proved that it is less affected to performance of the classification in spelling errors in documents.

Referência(s)