What are large language models supposed to model?

Revisão Acesso aberto Revisado por pares

What are large language models supposed to model?

2023; Elsevier BV; Volume: 27; Issue: 11 Linguagem: Inglês

10.1016/j.tics.2023.08.006

ISSN

1879-307X

Autores

Idan Blank,

Tópico(s)

Speech and dialogue systems

Resumo

Do large language models (LLMs) constitute a computational account of how humans process language? And if so, what is the role of (psycho)linguistic theory in understanding the relationship between artificial and biological minds? The answer depends on choosing among several, fundamentally distinct ways of interpreting these models as hypotheses about humans. Do large language models (LLMs) constitute a computational account of how humans process language? And if so, what is the role of (psycho)linguistic theory in understanding the relationship between artificial and biological minds? The answer depends on choosing among several, fundamentally distinct ways of interpreting these models as hypotheses about humans. Large language models (LLMs), like the GPTs, are deep learning systems designed to process language. They are exposed to massive collections of texts, and their explicit training objective is to successfully predict the next word (or a missing word) in a sentence or paragraph (nowadays, additional objectives are included for further training). This objective is called 'language modeling'. Meeting it requires, at minimum, mastering the distributional characteristics of words and syntactic structures in a language – learning what sentences are like. Hence, LLMs are supposed to model how utterances behave. Their success is striking: they can represent and process virtually any arbitrary sentence in a manner that allows them to output predicted continuations that are grammatical and meaningful (i.e., words appropriately combine into human-interpretable content, even if it is false, logically flawed, or biased). In this sense, LLMs are the best implemented computational model for natural language processing: they are the first systems that handle natural language at scale. Why are LLMs so successful? Experiments testing the probabilistic next-word predictions of LLMs find that these models may have inferred many complex linguistic generalizations from their training input. Some of these generalizations can already be observed in previous generations of LLMs, like BERT and GPT-2 [1.Wilcox E.G. et al.Learning syntactic structures from string input.in: Lappin S. Bernardy J.-P. Algebraic Structures in Natural Language. CRC Press, 2022: 113-138Google Scholar], and they span an impressive range – albeit not the full range – of linguistic phenomena. However, these results should not be embraced without skepticism, as evidence suggests that some of the apparent success of LLMs reflects brittle 'shortcuts', not true linguistic abstractions (especially in those older models) [2.Chaves R.P. Richter S.N. Look at that! BERT can be easily distracted from paying attention to morphosyntax.Proc. Soc. Comput. Linguist. 2021; 4: 28-38Google Scholar]. The flurry of research and debates reflects fascination with a psychological question: do LLMs process language like humans? After all, LLMs are trained to predict words, and linguistic prediction in human minds is a central, robust, and ubiquitous process. Still, different systems might achieve prediction in distinct ways, and human language processing is arguably more than just prediction. Beyond being a model of language as a collection of word sequences, could LLMs be a model of language as a mental phenomenon? This second sense of 'modeling' is my focus here. My goal is to reintroduce to this line of work a question that was central to discussions on connectionist models in the 1980s [3.Pinker S. Prince A. On language and connectionism: analysis of a parallel distributed processing model of language acquisition.Cognition. 1988; 28: 73-193Google Scholar,4.Bechtel W. Abrahamsen A. Connectionism and the Mind: an Introduction to Parallel Processing in Networks. Basil Blackwell, 1991Google Scholar], but whose significance has since been largely forgotten. Much current work remains agnostic to this question, yet it has both deep ramifications for ongoing (psycho)linguistic debates and practical consequences for future research. Because different scientists will likely answer this question differently, LLM research will be more structured if we explicitly address it. This question is: what do we mean by 'LLMs are models of human language processing'? One answer is that LLMs are a model of the brain: a 'mechanistic abstraction' of neural processes [5.Cao R. Yamins D. Explanatory models in neuroscience: Part 1--taking mechanistic abstraction seriously.arXiv. 2021; (Published online April 10, 2021)https://arxiv.org/abs/2104.01490Google Scholar], that is, an implementation-level model in Marr's levels of analysis [6.Marr D. The philosophy and the approach.in: Vision: a Computational Investigation into the Human Representation and Processing of Visual Information. MIT Press, 1982: 8-38Google Scholar]. Critically, under this view, what LLMs implement is symbolic computation: LLM states map onto the steps of a symbolic algorithm; this view has been termed 'implementational' [3.Pinker S. Prince A. On language and connectionism: analysis of a parallel distributed processing model of language acquisition.Cognition. 1988; 28: 73-193Google Scholar] or 'compatibilist' [4.Bechtel W. Abrahamsen A. Connectionism and the Mind: an Introduction to Parallel Processing in Networks. Basil Blackwell, 1991Google Scholar]. The distributed patterns of activation across hidden units within LLMs (i.e., the 'neurons' of the LLM) do not constitute representational currency. Instead, the functional organization of LLMs instantiates the symbolic data structures traditionally hypothesized in linguistics, and these structures themselves constitute a model of the mind. The LLM representational format uses discrete entities and categories (e.g., syntactic dependency types, thematic roles), functions and arguments, variables and fillers, etc. This view is perhaps implicitly endorsed by studies that map distributed activity in LLMs onto patterns of brain signals [7.Schrimpf M. et al.The neural architecture of language: Integrative modeling converges on predictive processing.Proc. Natl. Acad. Sci. 2021; 118e2105646118Google Scholar], or that try to recover from such activity patterns symbolic structures like syntactic trees [8.Manning C.D. et al.Emergent linguistic structure in artificial neural networks trained by self-supervision.Proc. Natl. Acad. Sci. 2020; 117: 30046-30054Google Scholar] or semantic propositions [9.Wong L. et al.From word models to world models: translating from natural language to the probabilistic language of thought.arXiv. 2023; (Published online June 22, 2023)https://arxiv.org/abs/2306.12672Google Scholar]. Under this view, (psycho)linguists still shoulder the burden of theory building at the representation level. Nonetheless, the study of LLMs is not limited to theory verification. The main goals for future work should be: (i) specifying the mapping between the implementational architecture of LLMs and symbolic structures; (ii) illuminating the conditions for the emergence of these structures (e.g., an LLM's training objective [7.Schrimpf M. et al.The neural architecture of language: Integrative modeling converges on predictive processing.Proc. Natl. Acad. Sci. 2021; 118e2105646118Google Scholar], training input, or architecture); and (iii) making LLMs more neurocognitively plausible (Box 1).Box 1Neurocognitive constraints and LLMsA model of human language processing should receive the same types of input, and face the same linguistic challenges, as humans. For instance, LLMs should exhibit a human-like trajectory of language acquisition, based on input of the size and content available to children. They should process not only text but also speech/sign and use multimodal information. Such work is already underway; yet future research should incorporate additional neurocognitive constraints.•In current LLMs, all words are available for processing in parallel. However, human language processing is strongly shaped by the sequential nature of the input because working memory capacity is limited: representations of past input decay and interfere with one another.•Current LLMs are feed-forward, but the human brain critically relies on recurrent processing.•LLMs integrate information over thousands of words, whereas language processing brain regions integrate information over fewer than 15 words (broader context is integrated in downstream regions related to episodic cognition [12.Mahowald K. Ivanova A. et al.Dissociating language and thought in large language models: a cognitive perspective.arXiv. 2023; (Published online January 16, 2023. https://arxiv.org/abs/2301.06627.)Google Scholar]).•LLMs are fine-tuned on a variety of tasks, but not every task conveyed via language is a linguistic task. Language processing brain regions selectively engage in linguistic processing but not in, for example, arithmetic, logical entailment, common sense reasoning, social cognition, or processing of event schema [12.Mahowald K. Ivanova A. et al.Dissociating language and thought in large language models: a cognitive perspective.arXiv. 2023; (Published online January 16, 2023. https://arxiv.org/abs/2301.06627.)Google Scholar]. A model of human language processing should receive the same types of input, and face the same linguistic challenges, as humans. For instance, LLMs should exhibit a human-like trajectory of language acquisition, based on input of the size and content available to children. They should process not only text but also speech/sign and use multimodal information. Such work is already underway; yet future research should incorporate additional neurocognitive constraints.•In current LLMs, all words are available for processing in parallel. However, human language processing is strongly shaped by the sequential nature of the input because working memory capacity is limited: representations of past input decay and interfere with one another.•Current LLMs are feed-forward, but the human brain critically relies on recurrent processing.•LLMs integrate information over thousands of words, whereas language processing brain regions integrate information over fewer than 15 words (broader context is integrated in downstream regions related to episodic cognition [12.Mahowald K. Ivanova A. et al.Dissociating language and thought in large language models: a cognitive perspective.arXiv. 2023; (Published online January 16, 2023. https://arxiv.org/abs/2301.06627.)Google Scholar]).•LLMs are fine-tuned on a variety of tasks, but not every task conveyed via language is a linguistic task. Language processing brain regions selectively engage in linguistic processing but not in, for example, arithmetic, logical entailment, common sense reasoning, social cognition, or processing of event schema [12.Mahowald K. Ivanova A. et al.Dissociating language and thought in large language models: a cognitive perspective.arXiv. 2023; (Published online January 16, 2023. https://arxiv.org/abs/2301.06627.)Google Scholar]. An alternative view is that LLMs model the human mind: a representation-level model in Marr's system [6.Marr D. The philosophy and the approach.in: Vision: a Computational Investigation into the Human Representation and Processing of Visual Information. MIT Press, 1982: 8-38Google Scholar]. No correspondence exists between the functional organization of LLMs and symbolic algorithms, because LLMs, and minds, use distributed, subsymbolic data structures. (Psycho)linguistic theories are fundamentally mistaken, specifying the wrong representational format in theories that lack psychological reality. Those who treat LLMs as cognitive (vs brain) models should take this 'eliminative' view [3.Pinker S. Prince A. On language and connectionism: analysis of a parallel distributed processing model of language acquisition.Cognition. 1988; 28: 73-193Google Scholar] seriously. If the (psycho)linguistic vocabulary is incompatible with the inner workings of LLMs, future research cannot attain goal (i) above, but goals (ii) and (iii) remain desirable: determining how linguistic knowledge emerges in LLMs and making them more brain‐like. Additionally, research should delineate how LLM behavior is causally influenced by the functional circuits of the model via targeted interventions on its activity. Attempts at a compromise between the views articulated above mainly consist of two claims. The first is that LLMs do implement symbolic structures, but these structures are different from those postulated by (psycho)linguists (perhaps due to constraints imposed by how LLMs compute) [3.Pinker S. Prince A. On language and connectionism: analysis of a parallel distributed processing model of language acquisition.Cognition. 1988; 28: 73-193Google Scholar]. Under this view, future work should further develop data-driven methods to discover and characterize these structures [10.Soulos P. et al.Discovering the compositional structure of vector representations with role learning networks.arXiv. 2019; (Published online November 16, 2020)https://arxiv.org/abs/1910.09113Google Scholar]. The second claim is that, even if LLMs do not run a symbolic program, they nonetheless approximate it via distributed dynamics [11.Smolensky P. et al.Neurocompositional computing: from the central paradox of cognition to a new generation of AI systems.AI Mag. 2022; 43: 308-322Google Scholar]. Indeed, recovering symbolic structures from current LLMs cannot be done with perfect accuracy [8.Manning C.D. et al.Emergent linguistic structure in artificial neural networks trained by self-supervision.Proc. Natl. Acad. Sci. 2020; 117: 30046-30054Google Scholar]; these imperfections might reflect such approximation (rather than, e.g., probabilistic inference over discrete symbols). Some might interpret 'symbolic approximation' to mean that the generating mechanism of language is truly symbolic, but a sufficient condition for successfully processing the resulting language by LLMs is a fuzzy version of this symbolic structure in some continuous, latent representational space. Parallel arguments could be made for other cognitive domains, for example, object vision: perhaps it requires merely approximating the generative symbolic system of entities, physical forces, and viewing parameters that produce sensory data. However, this argument fails in the language domain. Whereas the generative system for objects is external to the mind, the one underlying language is internal to the mind. One cannot argue that: (i) the mind uses a symbolic system to generate language; (ii) LLMs are a continuous approximation of this system (with no instantiation of discrete symbols); and (iii) LLMs are a cognitive model of the mind. Either the mind uses an explicitly symbolic representation, such that there is no need to approximate it (i.e., LLMs are not models of the mind); or the mind, like LLMs, uses a distributed representation that does not precisely map onto symbols, such that symbols exist nowhere and hence there is nothing to approximate. (I am not critiquing representational pluralism in general, only claiming that access to a symbolic system obviates the need for approximation.) Symbolic approximation, as interpreted this way, is not a viable cognitive hypothesis. An alternative interpretation is that approximation describes the relationship between symbolic (psycho)linguistic theories and a non-symbolic mind. More than the mind approximates symbols, theories of symbolic computation are approximations of cognition [11.Smolensky P. et al.Neurocompositional computing: from the central paradox of cognition to a new generation of AI systems.AI Mag. 2022; 43: 308-322Google Scholar]. They are an abstraction lacking psychological reality, but instrumental for human-interpretable explanation. In this sense, language is symbolic only if we squint. As this discussion demonstrates, different views regarding LLMs mirror longstanding theoretical disagreements in psycholinguistics about the format of linguistic representations. If LLMs indeed process language like humans, they could strongly inform these debates, because a LLM is itself a computational instantiation of a linguistic theory. This theory remains to be fully specified, by reverse engineering, because LLMs were not straightforwardly derived from pre-existing cognitive accounts (unlike, e.g., some incremental parsers). Therefore, by articulating the different interpretations of LLMs, this piece also advances the view that LLMs could serve as new tools to answer old questions. LLMs are uniquely powerful computational models. However, language scientists might lack consensus regarding what they are a model of (Figure 1), with different positions having distinct implications for how LLMs and the theories they embody could interact with traditional (psycho)linguistic theories going forward. Importantly, all directions for future work I described are already being pursued, with some less developed than others. Nonetheless, work relating LLMs to humans can be more theoretically grounded, productive, and integrative, if we each clarify our views about what it is that LLMs are supposed to model. No interests are declared.

Ver no editor

Altmetric

PlumX

Entrar

Lembrar minha senha

Receber meu e-mail de confirmação

What are large language models supposed to model?