A multilabel classification on topics of qur’anic verses in English translation using K-Nearest Neighbor method with Weighted TF-IDF
2019; IOP Publishing; Volume: 1192; Linguagem: Inglês
10.1088/1742-6596/1192/1/012026
ISSN1742-6596
AutoresGanesha Ihya Ulumudin, Adiwijaya Adiwijaya, Mohamad Syahrul Mubarok,
Tópico(s)Advanced Text Analysis Techniques
ResumoThere are so many information contained in the Qur'an, it will be difficult to bring up the information manually, moreover if someone wants to know more about the Qur'an. Therefore, there is a need to find information with a certain topic that already classified in the Qur'an, especially in one verse of the Qur'an may have more than one topic (multilabel). This research examined how to build classifier to classify multilabel data which is topics of Qur'anic verses with k-Nearest Neighbor method. In this research, there is a comparison between feature extraction, Weighted TF-IDF and TF-IDF. The result of that comparison is that Weigthed TF-IDF has better performance compared to normal TF-IDF. The highest result by finding the most optimal k score is k=25 with the average score of hamming loss = 0.134875. There will be a test to measure the effect of stopword removal and lemmatization with optimal k value, for a case without stopword removal, the result is 0.136375, whereas without the lemmatization the result is 0.13537. For not using stopword removal and lemmatization the average hamming loss is 0.1373125.
Referência(s)