Linking Cognitive Tokens to Biological Signals: Dialogue Context Improves Neural Speech Recognizer Performance
2013; Wiley; Volume: 35; Issue: 35 Linguagem: Inglês
ISSN
1551-6709
AutoresRichard Veale, Gordon Briggs, Matthias Scheutz,
Tópico(s)Neural Networks and Applications
ResumoLinking Cognitive Tokens to Biological Signals: Dialogue Context Improves Neural Speech Recognizer Performance Richard Veale (riveale@indiana.edu) Indiana University, 841 Eigenmann Hall Bloomington, IN 47406 USA Gordon Briggs (gbriggs@cs.tufts.edu) Tufts University, 200 Boston Ave. Medford, MA 02155 USA Matthias Scheutz (matthias.scheutz@tufts.edu) Tufts University, 200 Boston Ave. Medford, MA 02155 USA Abstract This paper presents a hybrid cognitive model engaged in ex- periments demonstrating a successful mechanism for applying top-down contextual bias to a neural speech recognition sys- tem to improve its performance. The hybrid model includes a model of social dialogue moves, which it uses to selectively bias word recognition probabilities at a low level in the neu- ral speech recognition system. The model demonstrates how symbolic and neurologically inspired components can success- fully exchange information and mutually influence their pro- cessing. Furthermore, the biasing mechanism is grounded in brain mechanisms of perceptual decision making. Keywords: Speech Recognition; Liquid State Machine; Dia- logue Context; Top-Down Bias; Signal-to-Token Conversion Introduction Human cognition comprises high-level knowledge-based pro- cesses as well as low-level perceptual and motor processes, both of which are implemented via electro-chemical mecha- nisms in the brain. High-level cognitive processes are often viewed as symbolic and discrete, while low-level perceptual and motor processes are subsymbolic and continuous. More- over, high-level processes are taken to operate on structured representations, while low-level processes will usually not be representational at all. Two key challenges in cognitive science are thus to understand (1) how high-level processes are realized in “neural hardware” and (2) how they can in- teract with low-level processes (e.g., how discrete symbolic knowledge can influence continuous subsymbolic processes and vice versa). We will focus on the second challenge in this paper. Connectionist computational modeling has made signif- icant progress in addressing (1) over the years, producing more and more refined neurologically plausible models of cognitive functions which are verified physiologically (e.g. (Machens, Romo, & Brody, 2005)). However, fewer efforts have been made to address (2). Only recently, hierarchical Bayesian models have been proposed as a natural, systematic way to connect higher-level to lower-level processes (Kemp & Tenenbaum, 2008). Similar to the Bayesian approach, our goal is to understand the interactions between these two types of processes which operate at fundamentally different levels. Hierarchical Bayesian modeling often focuses on the “computational level” (Marr, 1982), showing how higher- level processes can influence lower levels (e.g., by showing how distributions of higher-level structures constrain distribu- tions of lower-level items). In contrast, our approach attempts to address all three levels and their mutual interactions. This is because these levels cannot be considered in complete iso- lation in cases where higher-level processes have to interact with lower-level processes in real-time contexts with real- world inputs. Specifically, we claim that the nature and time- course of low-level processes imposes significant constraints on the possible ways of exchanging information with higher- level processes. Low-level processes will limit the types of computations that are allowed in higher-level processes that communicate with them, since they may have stringent tim- ing requirements and will not wait for a computation to finish with a result. Proposals that do not incorporate those con- straints might result in models that produce correct results under some empirical regimes, but which are infeasible given implemenation constraints. For example, a hierarchical Bayesian model of natural language processing might be able to show that high-level knowledge about grammar can successfully bias low-level speech processing, but whether that particular computational way of biasing is actually feasible and realistic in humans can only be determined by taking algorithmic and implementa- tion constraints into account. These constraints include time bounds caused by the incremental nature of the speech pro- cessor. In this case the high-level computation can not expect to have access to a whole utterance before it starts biasing, since by that point the speech processor will already have ad- vanced past the point where it is useful. Thus, although there are many ways in which higher levels could influence lower levels at the computational level, most of them are not re- alized in humans because of implementation or algorithmic constraints. This paper makes three contributions: first, we will present a general way of integrating high-level processes operating on structured symbolic knowledge with low-level neural pro-
Referência(s)