Artigo Acesso aberto Revisado por pares

Sentiment analysis by using Naïve‐Bayes classifier with stacked CARU

2022; Institution of Engineering and Technology; Volume: 58; Issue: 10 Linguagem: Inglês

10.1049/ell2.12478

ISSN

1350-911X

Autores

Ka‐Hou Chan, Sio‐Kei Im,

Tópico(s)

Stock Market Forecasting Methods

Resumo

Electronics LettersVolume 58, Issue 10 p. 411-413 LetterOpen Access Sentiment analysis by using Naïve-Bayes classifier with stacked CARU Ka-Hou Chan, Corresponding Author Ka-Hou Chan [email protected] orcid.org/0000-0002-0183-0685 Faculty of Applied Sciences, Macao Polytechnic University, Macau, China Email: [email protected]Search for more papers by this authorSio-Kei Im, Sio-Kei Im orcid.org/0000-0002-5599-4300 Macao Polytechnic University, Macau, ChinaSearch for more papers by this author Ka-Hou Chan, Corresponding Author Ka-Hou Chan [email protected] orcid.org/0000-0002-0183-0685 Faculty of Applied Sciences, Macao Polytechnic University, Macau, China Email: [email protected]Search for more papers by this authorSio-Kei Im, Sio-Kei Im orcid.org/0000-0002-5599-4300 Macao Polytechnic University, Macau, ChinaSearch for more papers by this author First published: 08 April 2022 https://doi.org/10.1049/ell2.12478AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinkedInRedditWechat Abstract A long sequence always contains long-term dependency problems, which leads to paragraph-based sentiment analysis being a very challenging task and difficult to evaluate by using a simple RNN network. It is proposed in this letter to use a stacked CARU network to extract the main information in a paragraph. The resulting network also points out how to use a CNN-based extractor to explore complete passages and capture useful features in their hidden state. In particular, instead of using the Softmax function, the Naïve-Bayes classifier is connected to the end of the CNN-based extractor. The proposed models also take into account the conditional independence of the observed results under the hidden variables, which aims to project features into a probability distribution appreciated for its simplicity and interpretability. The advantages of these models in sentiment analysis are empirically investigated by combining the usual classifiers with the results of GloVe embedding on the SST-5 and IMDB datasets. Introduction Sentiment analysis is a hot topic in classification tasks. In recent years, sentiment analysis has been studied under artificial intelligence and natural language processing (NLP). With the rapid growth of social media information obtained from the Internet and the development of NLP techniques, opportunities are provided to mine this data in various fields, such as tourism, marketing or political environments. However, differences in writing styles and occasions always make it difficult to get the main formation of a passage. In sentiment analysis, researchers conduct investigations into the sentiments expressed by the writer on the topic. Many studies can be handled in opinion mining, such as movie reviews, product ratings and pre-election voting comments. In contrast, sentiment analysis aims to accomplish text analysis through the scientific techniques of NLP and machine learning [1, 2]. In the document approach, the sentiment of entire paragraphs can be roughly classified as positive, negative or neutral. Sentiment analysis of documents and paragraphs is also referred to as contextual sentiment analysis. In this type of analysis, an entity may have more than one feature (aspect) to consider. Therefore, sentence tokenisation studies need to be pre-processed and then advanced classification of individual sentences can be considered. In general, the sentiment reflected from a passage can be considered by a number of features: including part of speech, sentence length and structural content. Ref. [3] first applied linear RNNs to NLP tasks, but this reflects the fact that each input feature has the same weight, and that feature often conflicts with the importance of various words. This is because a word may have other meanings or different importance in different combinations. In particular, punctuation and deposition tend to be less important than other words, and the main information will always be diluted if a linear RNN model is used. Therefore, by using a deep learning-based approach, there is the ability to learn high-level features of sentiment [4, 5]. Ref. [6] proposes an advanced RNN for processing multiple aspects in a sentence. In addition, ref. [7] proposed a multilayer structure that allows for better accuracy on multiple benchmark datasets. In addition, a content adaptive recursive unit (CARU) was designed by ref. [8] in order to exploit the relationship between part-of-speech and sentiment terms. However, all the above methods are for sentence-based input features and are not effective for the paragraph-based case. This is because all RNNs so far can only consider the current and previous contents of sequence data. To consider the following content and achieve complete sentence understanding can only be achieved by network models, which is beyond the capability of RNN units. Framework Machine learning methods are distributed in well-known neural network algorithms that are widely used in classification problems. In order to design a network that can adapt to more cases to handle the embedding vectors of words, our proposed model is indicated in Figure 1, and the next section details two main innovations in our work, namely the stacked CARU network and the Naïve-Bayes classifier [9]. Fig. 1Open in figure viewerPowerPoint Schematic diagram of our proposed sentiment analysis model. The stacked CARU aims to extract the main informative and discriminative features from the embedded vectors, and then a Naïve-Bayes classifier is used to predict the probability distribution of the feature input types to extract more effective features for better decision-making As shown in Figure 1, the received Paragraph as input source will be projected into the embedding vector by a support vector machine (SVM) technique [10], where sentence/word tokenisation can be done in a preprocessing operation. This means that each paragraph will be tagged as a sequence of sentences s ( t ) $s^{(t)}$ , which are further tagged as a sequence of e i ( t ) $e^{(t)}_i$ . Based on the sequence of input features, the output features are finally classified by a fully connected (FC) (a set of linear layers) network. There are some types of linear classifiers that aim to discover higher-level feature in N $N$ -dimensional space [11]. In neural network classification, neurons are arranged in layers to convert input vectors into output vectors, which is based on the probability distribution to predict the outcome class rather than on the most probable observation. The basic principle of the Naïve-Bayes classifier is that "each pair of attributes to be classified should be independent of each other". The strong independence between attributes is a key feature of Bayesian network classifiers [12]. In maximum entropy, the least informative weights must be considered first and then optimized to find the weights that maximize the similarity of the data. In view of this, our proposed network consists of two stages, which also illustrates the main contribution of this work: The first step focuses on sentence analysis. In this section, we extend our previous work [8] to implement CARUs in a stacked structure. The output (or hidden state) of each CARU unit is connected to the input of the higher-level unit, also allowing for different dynamic lengths of the data stream. The second step proceeds to convert the hidden states into a probabilistic model and indicates their efficiency in sentiment analysis compared to the usual form of Naïve-Bayes classifier. Stacked CARU Well-designed feature extraction is essential for NLP tasks that capture the nuances between (word) sentences in a paragraph. Some previous work has utilised additional information to train a more comprehensive feature extractor [13]. In this letter, to achieve this goal, we disregard the extra information and propose a stacked CARU to achieve this goal, i.e. to obtain discriminative embeddings. The stacked CARU consists of several single CARU layer, as CARU and its variants have been the best overall performing RNNs for NLP tasks. As proposed in Figure 2, CARU contains an update gate like GRU, but introduces a content adaptation gate instead of a reset gate: Fig. 2Open in figure viewerPowerPoint The CARU architecture CARU is designed to alleviate the long-term dependency problem of RNN models. It has been found to provide slight performance improvements on NLP tasks with fewer parameters than GRU. CARU has achieved outstanding results in a variety of sequence-to-sequence translation applications, from translation to natural language. This feature can also be learned in the middle layer, as recent developments in embedded vector representation learning have yielded excellent results in language modelling. Its learning approach can be thought of as a collection of units in which real-valued vectors are trained to capture the underlying meaning of the input features. Hidden state learning extends several word-level and sentence-level RNN architectures to support many peers in the middle layer. The proposed model creates a multistep training strategy in which sentence- and word-level language models are developed in a pipeline fashion. CARU can be considered that simultaneous neural tracking of hierarchical language structures provides a way to temporarily integrate smaller linguistic units into larger structures as the number of layers employed increases. As a result, we have a better understanding of the hierarchical nature of language organization. Because CARU has the advantage of facilitating NLP analysis of content, its content adaptive gate takes into account both words and content. Thus, it allows each layer to clearly do its job and also helps to solve the problem of long-term dependencies. As shown in Figure 3, a typical first-in, last-out (FILO) architecture uses update gates and content adaptive gates contributed by CARU, which has two inputs and a hidden state as output. This ensures successful feature extraction for discriminative learning of word embeddings. Each layer in the stacked CARU has its own unit, which includes the output features of the other units according to the previous principle. Because it can be spread over many possible training steps, the number of layers employed in the RNN can be seen as a "concrete" to "abstract" transformation. This is important because it may allow the network to compute and adapt to more complex representations. This layer typically discriminates the part-of-speech of each word and also filters out noise such as punctuation. After extracting the main features, it will be further processed by higher layers. Deep networks are usually based on this concept, making it possible for deeper units to process lower-level features (such as syntax and noise discarding), while shallow units determine higher-level feature patterns (such as semantics). After all layers have been processed, the hidden state created by the last unit is passed to the next module. In our design, the end of the stacked CARU is connected to the Chebyshev pooling, an advanced pooling layer designed to provide the best extraction for the classification of probabilistic projections [14]. Fig. 3Open in figure viewerPowerPoint A stacked implementation of CARU Naïve-Bayes classifier The Naïve-Bayes classifier assumes that each feature is influenced only by the class to which it belongs. This reflects the fact that the class is the primary connect to each hidden state. This approach ensures the best generalization under an explicit set of assumptions: The feature is not independent in anything that has a undiscovered relationship and survives in the main implicit relationship. Thus, Naïve-Bayes classifier proves constructive in the face of these violations. Due to its straightforward structure, Naïve-Bayes is fast, easy to use and effective. It is also suitable for high-dimensional data, since the probability of each feature is calculated separately. Let C $C$ denotes the class used for prediction, and we consider the values X $X$ of hidden random variables in a discrete finite set. The class of observations X $X$ can be predicted by applying the Naïve-Bayes rule, which is a probabilistic model considering these variables. Its simplicity and interpretability are particularly appreciated and can be described by a probabilistic model: P C | X = P C P X | C P X \begin{equation} P{\left(C|X\right)}=\frac{P{\left(C\right)}P{\left(X|C\right)}}{P{\left(X\right)}} \end{equation} (1)According to such model, using the assumption that the features of X 0 , X 1 , … $X_0,X_1,\ldots$ are conditionally independent of each other in the Naïve-Bayes considerations, we then obtain an evolutionary expression: P C | X = P C ∏ i N P X i | C P X \begin{equation} P{\left(C|X\right)}=\frac{P{\left(C\right)}\prod ^N_i P{\left(X_i|C\right)}}{P{\left(X\right)}} \end{equation} (2)The classification induced by Naïve-Bayes classifier under supervised learning always uses the form of Equation (2). It is sufficient to predict the most likely class for a given test observation, such as sentiment analysis or text classification. Fig. 4Open in figure viewerPowerPoint Probabilistic oriented graph of the Naïve Bayes In order to apply the Naïve-Bayes classifier to the extraction of continuous data and estimate the class probability of P ( C ) $P(C)$ and the conditional probability of P ( X i | C ) $P(X_i|C)$ with i = 1 , 2 , … $i=1,2,\ldots$ in Figure 4, we should first discretise the received features with the aim of converting continuous values to discrete values. A simple way to do this is to use median-based discretization, which converts continuous features into a range of the set { 0.0 , 1.0 } $\{0.0,1.0\}$ , and then we can use the discrete orthogonal matrix generation method, which is the most frequently applied discretisation method in the literature [15]. After, the Naïve-Bayes can be written in the discriminative way of neural network functions. Because this neural model combines the functionality of the Naïve-Bayes classifier, it has the advantage of considering complex observed features. In addition, we employ a self-weighting module [16], which is an additional neural model that enhances the objective of the Naïve Bayes classifier, given hidden random variables to improve sentiment analysis. Finally, we empirically investigate the contribution of these different innovations in the sentiment analysis task. Experiment Our targets are to perform several experiments for sentiment analysis. As indicated in Figure 1, the inputs are paragraph or sets of content-related sentences, which come from three different text classification data sets: SST-5 [17] and IMDB [18]. All our experiments are conducted on an NVIDIA GeForce RTX 2080Ti with 11.0 GB of video memory. In order to compare the proposed strategy with the original networks, we re-implement several state-of-the-art baselines in recent years: L-MIXED [19], Neural Naïve-Bayes [20] and variable-depth [21], also including traditional models: CNN+LSTM [22] and CEN-tpc [23] in PyTorch [24]. All experiments use the same dataset in each test with a batch size of 100 per iteration set, and with the same configuration. In addition, there is a scheduler for adjusting the learning rate. It reduces the learning rate when the loss becomes stagnant. The experiment results given in Table 1 can be found that the proposed model is better than the others. As expected, we can observe a slight improvement in the proposed model by a gap of 1.0% improvement on the SST-5 and IMDB datasets with GloVe embeddings. In contrast, the IMDB test is very close to the second result, but does not exceed it. The reason is that the content of the IMDB dataset is mainly content comments, which are mostly phrases and hard to extract the feature of sentiment. Therefore, this confirms the importance of considering complex features and indicate the potential of proposed model in sentiment analysis. According to these results based on the test set of each datasets, the improvement in accuracy is due to the consideration of stacked CARU layers are employed, allowing us to build a more accurate model. Table 1. Comparison with other state-of-the-art methods. Bold values are used to emphasize that the model can achieve better accuracy (%) SST-5 IMDB Proposed model 59 . 4 ± 0.14 $\bm {59.4}\pm 0.14$ 94.69 ± 0.08 $94.69\pm 0.08$ L-MIXED [19] 57.6 ± 0.20 $57.6\pm 0.20$ 94 . 72 ± 0.11 $\bm {94.72}\pm 0.11$ Variable-depth [21] 57.5 ± 0.16 $57.5\pm 0.16$ 93.57 ± 0.21 $93.57\pm 0.21$ CEN-tpc [23] 55.2 ± 0.21 $55.2\pm 0.21$ 93.20 ± 0.29 $93.20\pm 0.29$ Neural Naïve-Bayes [20] 55.1 ± 0.18 $55.1\pm 0.18$ 93.03 ± 0.15 $93.03\pm 0.15$ CNN+LSTM [22] 51.4 ± 0.27 $51.4\pm 0.27$ 88.01 ± 0.18 $88.01\pm 0.18$ Conclusion We provide in this letter a novel and effective model for sentiment analysis of NLP tasks that extracts pattern information and discriminative embedding vectors. Our proposed stacked CAUR aims to extract the primary information and integrates stacked connectivity for optimal feature extraction. We also use the Naïve-Bayes classifier to explore and capture the important information of hidden states and generate full connectivity to obtain probability distributions for the analysis of neural processing. Our approach gives a good way to accomplish this paragraph-based document classification process and can be directly used for the associated neural structures because it has almost no additional consumption. The experiments show that the proposed model can obtain results comparable to the latest state-of-the-art models. This model will be further investigated in the future and will be compared using other NLP tasks or other classification applications. Funding information This work is funded by the Faculty of Applied Sciences, Macao Polytechnic University, Macau, China. Conflict of interest The authors declare no conflict of interest. Open Research Data availability statement Data derived from public domain resources. References 1Cambria, E., Das, D., Bandyopadhyay, S., Feraco, A.: Affective computing and sentiment analysis. In: A Practical Guide to Sentiment Analysis, pp. 1–10. Springer, Cham (2017) 2Patel, K., Mehta, D., Mistry, C., Gupta, R., Tanwar, S., Kumar, N., et al.: Facial sentiment analysis using AI techniques: state-of-the-art, taxonomies, and challenges. IEEE Access 8, 90495– 90519 (2020) 3Mikolov, T., Kombrink, S., Burget, L., Cernocky, J., Khudanpur, S.: Extensions of recurrent neural network language model. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5528–5531. IEEE, Piscataway, NJ (2011) 4Chen, Z., Xue, Y., Xiao, L., Chen, J., Zhang, H.: Aspect-based sentiment analysis using graph convolutional networks and co-attention mechanism. In: Communications in Computer and Information Science, pp. 441– 448. Springer, Cham (2021) 5Li, W., Shao, W., Ji, S., Cambria, E.: BiERU: Bidirectional emotional recurrent unit for conversational sentiment analysis. Neurocomputing 467, 73– 82 (2022) 6Marcheggiani, D., Täckström, O., Esuli, A., Sebastiani, F.: Hierarchical multi-label conditional random fields for aspect-oriented opinion mining. In: Lecture Notes in Computer Science, pp. 273– 285. Springer, Cham (2014) 7Alboaneen, D.A., Tianfield, H., Zhang, Y.: Sentiment analysis via multi-layer perceptron trained by meta-heuristic optimisation. In: 2017 IEEE International Conference on Big Data (Big Data), pp. 4630–4635. IEEE, Piscataway, NJ (2017) 8Chan, K.H., Ke, W., Im, S.K.: CARU: A content-adaptive recurrent unit for the transition of hidden state in NLP. In: Neural Information Processing, pp. 693– 703. Springer, Cham (2020) 9Zolnierek, A., Rubacha, B.: The empirical study of the Naïve Bayes classifier in the case of Markov chain recognition task. In: Adv. Soft Comput. 3, 329– 336 (2001) 10Suthaharan, S.: Support vector machine. In: Machine Learning Models and Algorithms for Big Data Classification, pp. 207– 235. Springer, Cham (2016) 11Chan, K.H., Im, S.K., Ke, W.: Self-adaptive layer: An application of function approximation theory to enhance convergence efficiency in neural networks. In: 2020 International Conference on Information Networking (ICOIN), pp. 447– 452. IEEE, Piscataway, NJ (2020) 12Bielza, C., Larrañaga, P.: Discrete Bayesian network classifiers. ACM Computing Surveys 47(1), 1– 43 (2014) 13Ren, R., Liu, Z., Li, Y., Zhao, W.X., Wang, H., Ding, B., et al.: Sequential recommendation with self-attentive multi-adversarial network. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 89– 98. ACM, New York (2020) 14Chan, K.H., Pau, G., Im, S.K.: Chebyshev pooling: An alternative layer for the pooling of CNNs-based classifier. In: 2021 IEEE 4th International Conference on Computer and Communication Engineering Technology (CCET), pp. 106– 110. IEEE, Piscataway, NJ (2021) 15Chan, K.H., Ke, W., Im, S.K.: A general method for generating discrete orthogonal matrices. IEEE Access 9, 120380– 120391 (2021) 16Chan, K.H., Im, S.K., Zhang, Y.: A self-weighting module to improve sentiment analysis. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1– 6. IEEE, Piscataway, NJ (2021) 17Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C.D., Ng, A.Y., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp. 1631– 1642. Association for Computational Linguistics, Seattle, WA (2013) 18AlBadani, B., Shi, R., Dong, J.: A novel machine learning approach for sentiment analysis on twitter incorporating the universal language model fine-tuning and SVM. Appl. Syst. Innovation 5(1), 13 (2022) 19Sachan, D.S., Zaheer, M., Salakhutdinov, R.: Revisiting LSTM networks for semi-supervised text classification via mixed objective function. Proc. AAAI Conf. Artif. Intell. 33(1), 6940– 6948 (2019) 20Gautam, J., Atrey, M., Malsa, N., Balyan, A., Shaw, R.N., Ghosh, A.: Twitter data sentiment analysis using Naïve Bayes classifier and generation of heat map for analyzing intensity geographically. In: Advances in Applications of Data-Driven Computing, pp. 129– 139. Springer, Singapore (2019) 21Chan, K.H., Im, S.K., Ke, W.: Variable-depth convolutional neural network for text classification. In: Communications in Computer and Information Science, pp. 685– 692. Springer, Cham (2020) 22Camacho Collados, J., Pilehvar, M.T.: On the role of text preprocessing in neural network architectures: an evaluation study on text categorization and sentiment analysis. In: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pp. 40– 46. Association for Computational Linguistics, Cedarville, OH (2018) 23Ito, T., Tsubouchi, K., Sakaji, H., Yamashita, T., Izumi, K.: Contextual sentiment neural network for document sentiment analysis. Data Sci. Eng. 5(2), 180– 192 (2020) 24Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Conference on Neural Information Processing Systems, pp. 8024– 8035. Curran Associates, Red Hook, NY (2019) Volume58, Issue10May 2022Pages 411-413 FiguresReferencesRelatedInformation

Referência(s)
Altmetric
PlumX