Artigo Acesso aberto Revisado por pares

Forum Duplicate Question Detection by Domain Adaptive Semantic Matching

2020; Institute of Electrical and Electronics Engineers; Volume: 8; Linguagem: Inglês

10.1109/access.2020.2982268

ISSN

2169-3536

Autores

Zhuojia Xu, Hua Yuan,

Tópico(s)

Natural Language Processing Techniques

Resumo

Community Question Answering (CQA) forums, such as Stack Overflow, Stack Exchange and Massive Open Online Course (MOOC) forums, spend a lot of manpower and time to manage duplicate questions on the forum. Mismatch of duplicate questions makes users keep asking “new” questions, and the continuous accumulation of duplicate questions may interfere with their information searching again, affecting user satisfaction. Neural Networks (NN) models for parsing semantics provide the possibility of end-to-end duplicate question detection. Whereas, due to lack of domain data and expertise, NN models for semantic parsing are rarely directly applied to CQA duplicate question detection. This paper proposes a Semantic Matching Model (SMM) integrated with the multi-task transfer learning framework for multi-domain forum duplicate question detection. By designing the word-to-sentence interaction mechanism based on the word-to-word interaction, SMM can automatically choose to ignore or pay attention to potential similar words according to the semantics at the sentence level. The experiments on the benchmark data set and MOOC forum data set state that SMM outperforms baselines, its interaction mechanism is effective and it has an advantage in cross-domain duplicate question detection.

Referência(s)