Artigo Revisado por pares

Detecting Entities of Works for Chinese Chatbot

2020; Association for Computing Machinery; Volume: 19; Issue: 6 Linguagem: Inglês

10.1145/3414901

ISSN

2375-4702

Autores

Chuhan Wu, Fangzhao Wu, Tao Qi, Junxin Liu, Yongfeng Huang, Xing Xie,

Tópico(s)

Advanced Text Analysis Techniques

Resumo

Chatbots such as Xiaoice have gained huge popularity in recent years. Users frequently mention their favorite works such as songs and movies in conversations with chatbots. Detecting these entities can help design better chat strategies and improve user experience. Existing named entity recognition methods are mainly designed for formal texts, and their performance on the informal chatbot conversation texts may not be optimal. In addition, these methods rely on massive manually annotated data for model training. In this article, we propose a neural approach to detect entities of works for Chinese chatbot. Our approach is based on a language model (LM) long-short term memory (LSTM) convolutional neural network (CNN) conditional random value (CRF), or LM-LSTM-CNN-CRF, framework, which contains a language model to generate context-aware character embeddings, a Bi-LSTM network to learn contextual character representations from global contexts, a CNN to learn character representations from local contexts, and a CRF layer to jointly decode the character label sequence. In addition, we propose an automatic text annotation method via quote marks to reduce the effort of manual annotation. Besides, we propose an iterative data purification method to improve the quality of the automatically constructed labeled data. Massive experiments on a real-world dataset validate that our approach can achieve good performance on entity detection for Chinese chatbots.

Referência(s)