Artigo Acesso aberto Revisado por pares

Representation of Semantic Word Embeddings Based on SLDA and Word2vec Model

2023; Institution of Engineering and Technology; Volume: 32; Issue: 3 Linguagem: Inglês

10.23919/cje.2021.00.113

ISSN

2075-5597

Autores

Tang Huanling, Hui Zhu, Wei Hongmin, Han Zheng, Mao Xueli, Mingyu Lu, Jin Guo,

Tópico(s)

Text and Document Classification Technologies

Resumo

To solve the problem of semantic loss in text representation, this paper proposes a new embedding method of word representation in semantic space called wt2svec based on supervised latent Dirichlet allocation (SLDA) and Word2vec. It generates the global topic embedding word vector utilizing SLDA which can discover the global semantic information through the latent topics on the whole document set. It gets the local semantic embedding word vector based on the Word2vec. The new semantic word vector is obtained by combining the global semantic information with the local semantic information. Additionally, the document semantic vector named doc2svec is generated. The experimental results on different datasets show that wt2svec model can obviously promote the accuracy of the semantic similarity of words, and improve the performance of text categorization compared with Word2vec.

Referência(s)
Altmetric
PlumX