Aoba_v3 bot: a multimodal chatbot system combining rules and various response generation models

Artigo Revisado por pares

Aoba_v3 bot: a multimodal chatbot system combining rules and various response generation models

2023; Taylor & Francis; Volume: 37; Issue: 21 Linguagem: Inglês

10.1080/01691864.2023.2240883

ISSN

1568-5535

Autores

Shoji Moriya, Daiki Shiono, Riki Fujihara, Yosuke Kishinami, Subaru Kimura, Shusaku Sone, Reina Akama, Yuta Matsumoto, Jun Suzuki, Kentaro Inui,

Tópico(s)

Natural Language Processing Techniques

Resumo

AbstractIn this paper, we present a multimodal dialogue system combining a neural response generation mechanism, a reranking mechanism, and a rule-based avatar control mechanism. Our system was submitted to the open track at the Fifth Dialogue System Live Competition and won second place. Remarkably, our system received the best human evaluation performance for visual information control (i.e. speaking style of avatar) in the preliminary round. The assessment of the competition evaluators revealed that our system generates natural utterances appropriate to the conversational context and topic with an appealing speaking style. Through the analysis, we found that our devices, such as post-processing on speech recognition results and the final response selection method, are effective, but we also found room for improvement, such as speech recognition errors and challenges in the reranking module.Keywords: Dialogue systems and robots for competitionmultimodal dialogue system AcknowledgmentsWe thank the Fifth Dialogue System Live Competition committee for hosting the competition and providing some software. We also thank Ryoko Tokuhisa and Shiki Sato for their helpful feedback. We also thank members of Tohoku NLP Group for their cooperation in the system development. We also thank Enago (www.enago.jp) for the English language review.Disclosure statementNo potential conflict of interest was reported by the author(s).Notes1 ‘Response’ refers only to what the system says and does not include nonverbal behaviors of the system such as gestures and facial expressions.2 The software used for speech recognition, speech synthesis, and avatar control was provided by the Fifth Dialogue System Live Competition [Citation2].3 We determined whether Japanese is used by a character code.4 The Jaccard coefficient is a measure of similarity between two sets, and Jaccard coefficient for a set X and Y is expressed as J(X,Y)=|X∩Y||X∪Y|.5 To indicate the division points of multiple tweets in the input series, we prepared division tokens and inserted them between each tweet.6 https://github.com/nttcslab/japanese-dialog-transformers.7 https://huggingface.co/rinna/japanese-gpt-1b/commit/a3c6e8478d5afa92fe5174b984555e01fe378cd3.8 https://huggingface.co/datasets/allenai/c4.9 http://data.statmt.org/cc-100/ja.txt.xz.10 https://dumps.wikimedia.org/other/cirrussearch.11 https://github.com/nttcslab/japanese-dialog-transformers.12 When the dialogue context contains multiple utterances, they are separated by 〈s〉 for the aoba model and by [SEP] for the NTTCS and rinna GPT-2 models.13 https://github.com/huggingface/transformers/tree/main/examples/pytorch/language-modeling.14 Qualitative confirmation of the response generation results for each of the three models revealed that the rinna GPT-2 model generated more natural responses than the other models. Therefore, we used multiple checkpoints for the rinna GPT-2 model in the system's response generation module to generate more natural and diverse response candidates.15 https://huggingface.co/cl-tohoku/bert-base-japanese.16 https://huggingface.co/rinna/japanese-roberta-base.17 The images of facial expressions in the figure outputted by the software were provided by the Fifth Dialogue System Live Competition [Citation2].18 The start utterance is ‘Hello, my name is Erica. What shall we talk about?’.19 The end utterance is ‘It is almost time for me to leave. Thank you very much.’20 Which hand (right/left) and which position (high/low) were randomly selected each time from four different motions.21 For the Twitter data, we randomly sampled 100,000 tweets from Twitter data from 2013–2018.Additional informationFundingThis work was supported by Japan Society for the Promotion of Science Grant Number JP22K17943 (empirical research), JP22H00524, and Moonshot Research and Development Program Grant Number JPMJMS2011 (fundamental research).Notes on contributorsShoji MoriyaShoji Moriya Shoji Moriya received B.E. in Department of Electrical, Information and Physics Engineering, School of Engineering, Tohoku University in 2023. He is currently a master's course student in the Graduate School of Information Sciences, Tohoku University. His research interest includes dialogue systems.Daiki ShionoDaiki Shiono Daiki Shiono received B.E. in Department of Electrical, Information and Physics Engineering, School of Engineering, Tohoku University in 2022. He is currently a master's course student in the Graduate School of Information Sciences, Tohoku University. His research interests Vision & Language.Riki FujiharaRiki Fujihara Riki Fujihara received B.E. in Department of Electrical, Information and Physics Engineering, School of Engineering, Tohoku University in 2021. He received MS in Information Science from Tohoku University in 2023. His current affiliation is Recruit Co., Ltd. His research interests include natural language processing.Yosuke KishinamiYosuke Kishinami Yosuke Kishinami received B.E. in Department of Electrical, Information and Physics Engineering, School of Engineering, Tohoku University in 2021. He received M.S. in Information Science from Tohoku University in 2023. His current affiliation is Future Corporation. His research interests include machine learning and natural language processing.Subaru KimuraSubaru Kimura Subaru Kimura received B.E. in Department of Electrical, Information and Physics Engineering, School of Engineering, Tohoku University in 2022. He is currently a master's course student in the Graduate School of Information Sciences, Tohoku University. His research interests Vision & Language.Shusaku SoneShusaku Sone Shusaku Sone received the Ph.D. degree in Biomedical Engineering in 2015 from Tohoku University, Japan. He is currently a Ph.D. student in the Graduate School of Information Sciences, Tohoku University, and is also currently a researcher with Omron and Omron SINIC X Corp. His research interests include healthcare systems, human-computer interaction, and natural language processing. He is a member of the IEEE, ACM, and ACL.Reina AkamaReina Akama Reina Akama received her Ph.D. in information science from Tohoku University in 2021. She has been an assistant professor at Tohoku University Center for Data-driven Science and Artificial Intelligence since 2021. She has also been a visiting researcher at RIKEN Center for Advanced Intelligence Project since 2021. Her current research interests include machine learning and natural language processing.Yuta MatsumotoYuta Matsumoto Yuta Matsumoto received B.E. in Department of Electrical, Information and Physics Engineering, School of Engineering, Tohoku University in 2021. He received M.S. in Information Science from Tohoku University in 2023. His current affiliation is Recruit Co., Ltd. His research interests include machine learning and natural language processing.Jun SuzukiJun Suzuki Jun Suzuki is a full professor at the Center for Data-driven Science and Artificial Intelligence at Tohoku University. From 2001 to 2018, he was a researcher (Distinguished researcher) at NTT Communication Science Laboratories, NTT Corporation. In 2005, while working at NTT CS Lab, he completed his doctoral dissertation and received a Ph.D. in engineering from the Graduate School of Information Science, Nara Institute of Science and Technology. He joined Tohoku University in 2018 as an associate professor of the Graduate School of Information Science (GSIS). He has been in the current position since 2020. He was also a visiting researcher at Google LLC (Google Brain Team) from 2020 to 2022 under a contract of cross-appointment agreement. His research interests include machine learning and natural language processing.Kentaro InuiKentaro Inui Kentaro Inui is a professor at the Graduate School of Information Sciences, Tohoku University, and heads the Natural Language Processing Lab. He also leads the Natural Language Understanding Team at the RIKEN Center for the Advanced Intelligence Project. He has 30 years of experience in natural language processing and artificial intelligence. He currently serves as Chairperson of the Japanese Association for Natural Language Processing, Member of the Science Council of Japan, and Director of NPO FactCheck Initiative Japan.

Ver no editor

Altmetric

PlumX

Entrar

Lembrar minha senha

Receber meu e-mail de confirmação

Aoba_v3 bot: a multimodal chatbot system combining rules and various response generation models