Revisão Acesso aberto Revisado por pares

Artificial Intelligence and the Common Sense of Animals

2020; Elsevier BV; Volume: 24; Issue: 11 Linguagem: Inglês

10.1016/j.tics.2020.09.002

ISSN

1879-307X

Autores

Murray Shanahan, Matthew Crosby, Benjamin Beyret, Lucy G. Cheke,

Tópico(s)

Evolutionary Game Theory and Cooperation

Resumo

•Endowing computers with common sense remains one of the biggest challenges in the field of artificial intelligence (AI).•Most treatments of the topic foreground language, yet an understanding of everyday concepts such as objecthood, containers, obstructions, paths, etc. is arguably: (i) a prerequisite for language, and (ii) evident to some degree in non-human animals.•The recent advent of deep reinforcement learning (RL) in 3D simulated environments allows AI researchers to train and test (virtually) embodied agents in conditions analogous to the life of an animal.•With the right architecture, an RL agent inhabiting a simulated 3D world has the potential to acquire a repertoire of fundamental common sense concepts and principles, given suitable environments, tasks, and curricula.•Experimental protocols from the field of animal cognition can be repurposed for evaluating the extent to which an agent, after training, 'understands' a common sense concept or principle, in particular in a transfer setting. The problem of common sense remains a major obstacle to progress in artificial intelligence. Here, we argue that common sense in humans is founded on a set of basic capacities that are possessed by many other animals, capacities pertaining to the understanding of objects, space, and causality. The field of animal cognition has developed numerous experimental protocols for studying these capacities and, thanks to progress in deep reinforcement learning (RL), it is now possible to apply these methods directly to evaluate RL agents in 3D environments. Besides evaluation, the animal cognition literature offers a rich source of behavioural data, which can serve as inspiration for RL tasks and curricula. The problem of common sense remains a major obstacle to progress in artificial intelligence. Here, we argue that common sense in humans is founded on a set of basic capacities that are possessed by many other animals, capacities pertaining to the understanding of objects, space, and causality. The field of animal cognition has developed numerous experimental protocols for studying these capacities and, thanks to progress in deep reinforcement learning (RL), it is now possible to apply these methods directly to evaluate RL agents in 3D environments. Besides evaluation, the animal cognition literature offers a rich source of behavioural data, which can serve as inspiration for RL tasks and curricula. The challenge of endowing computers with common sense has been seen as a major obstacle to achieving the boldest aims of artificial intelligence (AI) since the field's earliest days [1.McCarthy J. Programs with common sense.in: Proceedings of the Teddington Conference on the Mechanization of Thought Processes. Her Majesty's Stationary Office, 1959: 75-91Google Scholar] and it remains a significant problem today [2.Garnelo M. et al.Towards deep symbolic reinforcement learning.arXiv. 2016; (1609.05518)Google Scholar, 3.Davis E. Marcus G. Commonsense Reasoning and Commonsense Knowledge in Artificial Intelligence.Commun. ACM. 2015; 58: 92-103Crossref Scopus (175) Google Scholar, 4.Lake B.M. et al.Building machines that learn and think like people.Behav. Brain Sci. 2017; 40Crossref Scopus (705) Google Scholar, 5.Marcus G. Davis E. Rebooting AI: Building Artificial Intelligence We Can Trust. Ballantine Books Inc., 2019Google Scholar, 6.Smith B.C. The Promise of Artificial Intelligence: Reckoning and Judgment. MIT Press, 2019Crossref Google Scholar]. There is no universally accepted definition of common sense. However, most authors use language as a touchstone, following the example of [1.McCarthy J. Programs with common sense.in: Proceedings of the Teddington Conference on the Mechanization of Thought Processes. Her Majesty's Stationary Office, 1959: 75-91Google Scholar], who stated that '[a] program has common sense if it automatically deduces for itself a sufficiently wide class of immediate consequences of anything it is told and what it already knows'. Consequently, tests for common sense are typically language based. For example, one such test uses so-called 'Winograd schemas' [7.Levesque H.J. et al.The Winograd schema challenge.in: KR'12: Proceedings of the Thirteenth International Conference on Principles of Knowledge Representation and Reasoning. 2012: 552-561Google Scholar, 8.Sakaguchi K. et al.WINOGRANDE: An Adversarial Winograd Schema Challenge at Scale.in: AAAI Conference on Artificial Intelligence. 2020Crossref Google Scholar, 9.Brown T.B. et al.Language models are few-shot learners.arXiv. 2020; (2005.14165)Google Scholar]. These are pairs of sentences that differ by a single word and contain an ambiguous pronoun whose resolution depends on understanding some aspect of common sense. Consider the sentences 'The falling rock smashed the bottle because it was heavy' and 'The falling rock smashed the bottle because it was fragile'. The pronoun 'it' refers to the rock in the first sentence, but to the bottle in the second. We are able to resolve the pronoun correctly in each case because of our common sense understanding of falling and fragility. In this paper, by contrast, we will set language temporarily to one side and focus on common sense capacities that are also found in non-human animals. Our rationale is that these capacities are also the foundation for human common sense. They are, so to speak, conceptually prior to language and human language rests on the foundation they provide [10.Shanahan M. An attempt to formalise a non-trivial benchmark problem in common sense reasoning.Artif. Intell. 2004; 153: 141-165Crossref Scopus (27) Google Scholar]. Consider the phrase 'The falling rock smashed the bottle'. To understand this sentence, you have to know what a rock is and what it means for something to fall. But to understand what a rock is, you have to know what an object is. To grasp what falling is, you have to understand motion and space. And to understand the relationship between falling and smashing, you have to understand causality. Indeed, an understanding of objects, motion, space, and causality is a prerequisite for understanding any aspect of the everyday world, not just falling and rocks (cf. [11.Hayes P.J. The Naive Physics Manifesto.in: Michie D. Expert Systems in the Electronic Age. Edinburgh University Press, 1979: 242-270Google Scholar] and [12.Spelke E.S. Core Knowledge.Am. Psychol. 2000; 2000: 1233-1243Crossref Scopus (356) Google Scholar], not to mention Kant [13.Strawson P.F. The Bounds of Sense: An Essay on Kant's Critique of Pure Reason. Methuen & Co., 1966Google Scholar]). Unfortunately, this foundational layer of common sense, which is a prerequisite for human-level intelligence, is lacking in today's AI systems. Yet, thanks to a combination of evolution and learning, it is manifest in many non-human animals, to a greater or lesser degree [14.Call J. Object permanence in orangutans (Pongo pygmaeus), chimpanzees (Pan troglodytes), and children (Homo sapiens).J. Comp. Psychol. 2001; 115: 159-171Crossref PubMed Scopus (103) Google Scholar, 15.Seed A.M. et al.Investigating Physical Cognition in Rooks, Corvus frugilegus.Curr. Biol. 2006; 16: 697-701Abstract Full Text Full Text PDF PubMed Scopus (135) Google Scholar, 16.Taylor A.H. et al.Do New Caledonian crows solve physical problems through causal reasoning?.Proc. R. Soc. B Biol. Sci. 2009; 276: 247-254Crossref PubMed Scopus (143) Google Scholar, 17.Bird C.D. Emery N.J. Rooks use stones to raise the water level to reach a floating worm.Curr. Biol. 2009; 19: 1410-1414Abstract Full Text Full Text PDF PubMed Scopus (124) Google Scholar, 18.Cheke L.G. et al.Tool-use and instrumental learning in the Eurasian jay (Garrulus glandarius).Anim. Cogn. 2011; 14: 441-455Crossref PubMed Scopus (79) Google Scholar, 19.Vallortigara G. Core knowledge of object, number, and geometry: A comparative and neural approach.Cogn. Neuropsychol. 2012; 29: 213-236Crossref PubMed Scopus (86) Google Scholar, 20.Takagi S. et al.There's no ball without noise: cats' prediction of an object from noise.Anim. Cogn. 2016; 19: 1043-1047Crossref PubMed Scopus (10) Google Scholar]. For this reason, as we aim to show in this paper, the field of animal cognition [21.Shettleworth S.J. Cognition, Evolution, and Behavior. Oxford University Press, 2010Google Scholar] has a lot to offer AI. This is especially true in a reinforcement learning (RL) context, where, thanks to progress in deep learning [22.Arulkumaran K. et al.Deep reinforcement learning: a brief survey.IEEE Signal Process. Mag. 2017; 34: 26-38Crossref Scopus (987) Google Scholar], it is now possible to bring the methods of comparative cognition directly to bear [23.Beyret B. et al.The Animal-AI environment: training and testing animal-like artificial cognition.arXiv. 2019; (1909.07483)Google Scholar, 24.Crosby M. Building Thinking Machines by Solving Animal Cognition Tasks.Mind. Mach. 2020; Crossref Scopus (8) Google Scholar]. In particular, animal cognition supplies a compendium of well-understood, nonlinguistic, intelligent behaviour; it suggests experimental methods for evaluation and benchmarking; and it can guide environment and task design [25.Versace E. et al.Priors in animal and artificial intelligence: where does learning begin?.Trends Cogn. Sci. 2018; 22: 963-965Abstract Full Text Full Text PDF PubMed Scopus (22) Google Scholar]. Until the mid-2010s, it barely even made sense to think of assessing the cognitive abilities of a real AI system (as opposed to a hypothetical system of the future) using the same methods that are used to assess the cognitive abilities of animals. Students of animal cognition can take for granted a number of background assumptions that do not necessarily apply for an AI system. These include facts that are so obvious they go entirely unnoticed, such as the embodiment of an animal and its situatedness in a 3D spatial environment within which it can move and that contains objects with which it can interact [10.Shanahan M. An attempt to formalise a non-trivial benchmark problem in common sense reasoning.Artif. Intell. 2004; 153: 141-165Crossref Scopus (27) Google Scholar]. An equally obvious assumption that animal researchers can safely (and unconsciously) make is that their subjects are motivated by various basic needs and will therefore exhibit purposeful behaviour, rather than, say, simply doing nothing. None of these things is inherently true of an AI system. A disembodied digital assistant, such as Siri or Alexa, cannot be placed in a maze or presented with a box containing food. In the context of a robot, an embodied system that interacts with the real world, at least such a prospect makes sense. But most robots are programmed to carry out predefined tasks in highly constrained circumstances and to present one with a novel situation is more likely to result in inaction or catastrophe than to elicit interesting behaviour. With the advent of deep RL, however, these background assumptions can be satisfied and the cognitive prowess of an AI system can be evaluated using methods that were designed for animals. The RL setting, wherein an agent learns by trial-and-error to maximise its expected reward over time (Box 1), precludes inactivity and permits any cognitive challenge to be presented by means of a suitably designed environment and reward function. Until recently, RL systems with high-dimensional input (such as vision) were impractical. But this changed in the mid-2010s, when RL was paired with deep neural networks [26.LeCun Y. et al.Deep learning.Nature. 2015; 521: 436-444Crossref PubMed Scopus (37165) Google Scholar, 27.Schmidhuber J. Deep learning in neural networks: an overview.Neural Netw. 2015; 61: 85-117Crossref PubMed Scopus (9530) Google Scholar], inaugurating a new subfield. Among other successes, this led to the development of AlphaGo, the first program to defeat a top-ranked player at the game of Go [28.Silver D. et al.Mastering the Game of Go with Deep Neural Networks and Tree Search.Nature. 2016; 529: 484-489Crossref PubMed Scopus (7611) Google Scholar]. But the original breakthrough was DeepMind's DQN, a deep RL system that mastered a suite of Atari video games, playing many at superhuman level [29.Mnih V. et al.Human-level control through deep reinforcement learning.Nature. 2015; 518: 529-533Crossref PubMed Scopus (11820) Google Scholar] (Box 2).Box 1Model-Free and Model-Based Reinforcement Learning (RL)Suppose an agent acts according to a policy that maps its current input (possibly along with its internal state) to a recommendation for action. The job of RL is to improve the agent's policy, through trial and error, so as to maximise expected reward over time. In computer science, the study of RL has produced a substantial body of computational techniques for solving this problem [68.Sutton R.S. Barto A.G. Reinforcement Learning: An Introduction.2nd edition. MIT Press, 2018Google Scholar]. Recently, this field has been dominated by deep RL, wherein a deep neural network is trained to compute the function that maps inputs to actions [22.Arulkumaran K. et al.Deep reinforcement learning: a brief survey.IEEE Signal Process. Mag. 2017; 34: 26-38Crossref Scopus (987) Google Scholar]. This has led to success in domains with high-dimensional input spaces, for example, the stream of images from a camera.RL methods (independently of whether they use deep neural networks) can be broadly categorised as either model-based or model-free ([68.Sutton R.S. Barto A.G. Reinforcement Learning: An Introduction.2nd edition. MIT Press, 2018Google Scholar], Chapter 8). The models in question are transition models that map states and actions to successor states and rewards (or to distributions of successor states and rewards). That is to say, they allow the agent to predict the outcome of its (prospective) actions. In model-free RL, the agent learns and enacts its policy without reference to an explicit transition model. Model-free RL can be extremely effective and is the basis for many of the most impressive recent results in the field. However, if a transition model is available, or can be learned, then an agent can use it to simulate interaction with the environment (a form of inner rehearsal), without having to interact with the environment directly, enabling it to improve its policy offline [69.Kaiser L. et al.Model Based Reinforcement Learning for Atari.in: International Conference on Learning Representations. 2020Google Scholar] and/or to plan a course of actions prior to their execution [70.Racanière S. et al.Imagination-augmented agents for deep reinforcement learning.in: Advances in Neural Information Processing Systems 30. 2017: 5690-5701Google Scholar].Additionally, a good model of how the world works has general application, which enables transfer learning and promotes data efficiency [30.Kansky K. et al.Schema networks: zero-shot transfer with a generative causal model of intuitive physics.in: Proceedings International Conference on Machine Learning. 2017: 1809-1818Google Scholar,71.Hamrick J.B. Analogues of mental simulation and imagination in deep learning.Curr. Opin. Behav. Sci. 2019; 29: 8-16Crossref Scopus (22) Google Scholar. For example, if an agent understands that a long, thin, rigid object affords moving a reward item that is otherwise inaccessible, then it can apply that understanding to sticks and tubes, even if it has never encountered such objects before. However, current model-based deep RL methods, when applied to visual input, typically predict only the next few frames and do so at the pixel level. To realise the full promise of a model-based approach, RL methods will need to operate on a more abstract level.Box 2Training Protocols for Reinforcement Learning (RL)A variety of protocols are used to train deep RL agents [22.Arulkumaran K. et al.Deep reinforcement learning: a brief survey.IEEE Signal Process. Mag. 2017; 34: 26-38Crossref Scopus (987) Google Scholar]. The simplest is the single task setting. The agent is presented with one task many times, such as the Atari game Space Invaders, and its performance slowly improves [29.Mnih V. et al.Human-level control through deep reinforcement learning.Nature. 2015; 518: 529-533Crossref PubMed Scopus (11820) Google Scholar]. If the agent matches human performance, it is often said to have 'solved' the task. To solve a new task, such as the game Breakout, the agent is then re-initialised and learns it from scratch, losing its skill at Space Invaders in the process. In a multitask setting, by contrast, the agent learns many tasks together. If training is successful, the resulting agent will perform well on any of them. In the multitask setting, training tasks are often presented concurrently. Typically, episodes from different tasks are chopped up, interleaved, and stored in a replay buffer, and then presented to the learning component of the agent in a random order.In a sequential multitask setting (an example of continual learning [57.Schwarz J. et al.Progress & compress: a scalable framework for continual learning.in: Proceedings International Conference on Machine Learning. 2018: 4528-4537Google Scholar]), the agent learns tasks one at a time rather than concurrently. However, in contrast to the single task setting, the agent is not re-initialised after each task, and the final trained agent is expected to perform well on all the tasks [56.Badia A.P. et al.Agent57: outperforming the Atari human benchmark.in: Proceedings International Conference on Machine Learning. 2020Google Scholar]. From an engineering point of view, sequential multitask training is most difficult to get right, partly because of the phenomenon of catastrophic forgetting, wherein an agent trained on a new task loses its ability to perform well on tasks it was previously trained on [72.McCloskey M. Cohen N.J. Catastrophic interference in connectionist networks: The sequential learning problem.in: Psychology of Learning and Motivation. Vol. 24. Elsevier, 1989: 109-165Google Scholar,73.Kirkpatrick J. et al.Overcoming catastrophic forgetting in neural networks.Proc. Natl. Acad. Sci. U.S.A. 2017; 114: 3521-3526Crossref PubMed Scopus (1396) Google Scholar.Of course, the life of an animal is not divided into tasks (except perhaps in the laboratory). Rather, it is one long seamless episode. In this respect, the most realistic continual learning setting is one where there are no task boundaries, and this is arguably the most promising training protocol for acquiring the foundations of common sense. In RL terms, the objective is the same, to maximise expected reward over time, but the agent has a single, indefinitely extended 'life' rather than experiencing a sequence of discrete episodes. Despite the lack of task boundaries in this setting, it may be appropriate to structure the agent's life as a curriculum, where the agent has to become proficient on simpler tasks before it is confronted with more complex ones [74.Elman J.L. Learning and development in neural networks: the importance of starting small.Cognition. 1993; 48: 71-99Crossref PubMed Scopus (943) Google Scholar,75.Bengio Y. et al.Curriculum Learning.in: Proceedings International Conference on Machine Learning. 2009: 41-48Crossref Scopus (781) Google Scholar. Suppose an agent acts according to a policy that maps its current input (possibly along with its internal state) to a recommendation for action. The job of RL is to improve the agent's policy, through trial and error, so as to maximise expected reward over time. In computer science, the study of RL has produced a substantial body of computational techniques for solving this problem [68.Sutton R.S. Barto A.G. Reinforcement Learning: An Introduction.2nd edition. MIT Press, 2018Google Scholar]. Recently, this field has been dominated by deep RL, wherein a deep neural network is trained to compute the function that maps inputs to actions [22.Arulkumaran K. et al.Deep reinforcement learning: a brief survey.IEEE Signal Process. Mag. 2017; 34: 26-38Crossref Scopus (987) Google Scholar]. This has led to success in domains with high-dimensional input spaces, for example, the stream of images from a camera. RL methods (independently of whether they use deep neural networks) can be broadly categorised as either model-based or model-free ([68.Sutton R.S. Barto A.G. Reinforcement Learning: An Introduction.2nd edition. MIT Press, 2018Google Scholar], Chapter 8). The models in question are transition models that map states and actions to successor states and rewards (or to distributions of successor states and rewards). That is to say, they allow the agent to predict the outcome of its (prospective) actions. In model-free RL, the agent learns and enacts its policy without reference to an explicit transition model. Model-free RL can be extremely effective and is the basis for many of the most impressive recent results in the field. However, if a transition model is available, or can be learned, then an agent can use it to simulate interaction with the environment (a form of inner rehearsal), without having to interact with the environment directly, enabling it to improve its policy offline [69.Kaiser L. et al.Model Based Reinforcement Learning for Atari.in: International Conference on Learning Representations. 2020Google Scholar] and/or to plan a course of actions prior to their execution [70.Racanière S. et al.Imagination-augmented agents for deep reinforcement learning.in: Advances in Neural Information Processing Systems 30. 2017: 5690-5701Google Scholar]. Additionally, a good model of how the world works has general application, which enables transfer learning and promotes data efficiency [30.Kansky K. et al.Schema networks: zero-shot transfer with a generative causal model of intuitive physics.in: Proceedings International Conference on Machine Learning. 2017: 1809-1818Google Scholar,71.Hamrick J.B. Analogues of mental simulation and imagination in deep learning.Curr. Opin. Behav. Sci. 2019; 29: 8-16Crossref Scopus (22) Google Scholar. For example, if an agent understands that a long, thin, rigid object affords moving a reward item that is otherwise inaccessible, then it can apply that understanding to sticks and tubes, even if it has never encountered such objects before. However, current model-based deep RL methods, when applied to visual input, typically predict only the next few frames and do so at the pixel level. To realise the full promise of a model-based approach, RL methods will need to operate on a more abstract level. A variety of protocols are used to train deep RL agents [22.Arulkumaran K. et al.Deep reinforcement learning: a brief survey.IEEE Signal Process. Mag. 2017; 34: 26-38Crossref Scopus (987) Google Scholar]. The simplest is the single task setting. The agent is presented with one task many times, such as the Atari game Space Invaders, and its performance slowly improves [29.Mnih V. et al.Human-level control through deep reinforcement learning.Nature. 2015; 518: 529-533Crossref PubMed Scopus (11820) Google Scholar]. If the agent matches human performance, it is often said to have 'solved' the task. To solve a new task, such as the game Breakout, the agent is then re-initialised and learns it from scratch, losing its skill at Space Invaders in the process. In a multitask setting, by contrast, the agent learns many tasks together. If training is successful, the resulting agent will perform well on any of them. In the multitask setting, training tasks are often presented concurrently. Typically, episodes from different tasks are chopped up, interleaved, and stored in a replay buffer, and then presented to the learning component of the agent in a random order. In a sequential multitask setting (an example of continual learning [57.Schwarz J. et al.Progress & compress: a scalable framework for continual learning.in: Proceedings International Conference on Machine Learning. 2018: 4528-4537Google Scholar]), the agent learns tasks one at a time rather than concurrently. However, in contrast to the single task setting, the agent is not re-initialised after each task, and the final trained agent is expected to perform well on all the tasks [56.Badia A.P. et al.Agent57: outperforming the Atari human benchmark.in: Proceedings International Conference on Machine Learning. 2020Google Scholar]. From an engineering point of view, sequential multitask training is most difficult to get right, partly because of the phenomenon of catastrophic forgetting, wherein an agent trained on a new task loses its ability to perform well on tasks it was previously trained on [72.McCloskey M. Cohen N.J. Catastrophic interference in connectionist networks: The sequential learning problem.in: Psychology of Learning and Motivation. Vol. 24. Elsevier, 1989: 109-165Google Scholar,73.Kirkpatrick J. et al.Overcoming catastrophic forgetting in neural networks.Proc. Natl. Acad. Sci. U.S.A. 2017; 114: 3521-3526Crossref PubMed Scopus (1396) Google Scholar. Of course, the life of an animal is not divided into tasks (except perhaps in the laboratory). Rather, it is one long seamless episode. In this respect, the most realistic continual learning setting is one where there are no task boundaries, and this is arguably the most promising training protocol for acquiring the foundations of common sense. In RL terms, the objective is the same, to maximise expected reward over time, but the agent has a single, indefinitely extended 'life' rather than experiencing a sequence of discrete episodes. Despite the lack of task boundaries in this setting, it may be appropriate to structure the agent's life as a curriculum, where the agent has to become proficient on simpler tasks before it is confronted with more complex ones [74.Elman J.L. Learning and development in neural networks: the importance of starting small.Cognition. 1993; 48: 71-99Crossref PubMed Scopus (943) Google Scholar,75.Bengio Y. et al.Curriculum Learning.in: Proceedings International Conference on Machine Learning. 2009: 41-48Crossref Scopus (781) Google Scholar. Notwithstanding its impressive performance on Atari games, DQN inherited a number of shortcomings from deep learning [2.Garnelo M. et al.Towards deep symbolic reinforcement learning.arXiv. 2016; (1609.05518)Google Scholar,4.Lake B.M. et al.Building machines that learn and think like people.Behav. Brain Sci. 2017; 40Crossref Scopus (705) Google Scholar. First, it is not data efficient. It has to play a much larger number of games to reach human-level performance than a typical human. Second, it is brittle. A trained network is not robust to small changes in the game that a human would barely notice, such as background colour or the sizes of objects. Third, it is inflexible. Nothing of what it has learned on one game can be transferred to another similar game. (As many commentators have argued, human prowess in these respects can, in part, be attributed to the ability to bring common sense priors to bear when learning a new game, such as our everyday understanding of objects, motion, collision, gravity, etc. [2.Garnelo M. et al.Towards deep symbolic reinforcement learning.arXiv. 2016; (1609.05518)Google Scholar,4.Lake B.M. et al.Building machines that learn and think like people.Behav. Brain Sci. 2017; 40Crossref Scopus (705) Google Scholar,30.Kansky K. et al.Schema networks: zero-shot transfer with a generative causal model of intuitive physics.in: Proceedings International Conference on Machine Learning. 2017: 1809-1818Google Scholar,31.Dubey R. et al.Investigating human priors for playing video games.in: Proceedings of the 35th International Conference on Machine Learning. 2018: 1349-1357Google Scholar.) Despite progress on all these issues, none of them is fully resolved. Nevertheless, the arrival of deep RL has opened up the possibility of training an agent in a 3D virtual environment with (somewhat) realistic physics, whose input is the scene rendered from the agent's point of view, and whose output is a set of actions enabling the agent to move within the environment and interact with the objects it contains (Figure 1) [32.Beattie C. et al.DeepMind Lab.arXiv. 2016; (1612.03801)Google Scholar, 33.Brockman G. et al.OpenAI Gym.arXiv. 2016; (1606.01540)Google Scholar, 34.Kempka M. et al.ViZDoom: A Doom-based AI research platform for visual reinforcement learning.in: Proceedings IEEE Conference on Computational Intelligence and Games. 2016Crossref Scopus (219) Google Scholar]. These objects can include universal reward items analogous to food in the natural world, such as green spheres that yield positive reward when touched, then disappear as if consumed [23.Beyret B. et al.The Animal-AI environment: training and testing animal-like artificial cognition.arXiv. 2019; (1909.07483)Google Scholar, 24.Crosby M. Building Thinking Machines by Solving Animal Cognition Tasks.Mind. Mach. 2020; Crossref Scopus (8) Google Scholar]. At a fundamental level, the predicament of such an agent can be considered analogous to that of an animal. Although animals also act on various forms of intrinsic motivation (including curiosity, which we are certainly not ruling out for our agents), we contend that any cognitive challenge can be presented to a situated, (virtually) embodied RL agent in the guise of obtaining an external reward of a single type. Moreover, to the extent that such an agent acquires an 'understanding' of any aspect of common sense, this can be thought of as grounded in its interaction with its world, even though that world is virtual not physical. (In the literature, the concept of grounding is typically associated with symbols [35.Harnad S. The symbol grounding problem.Phy. D Nonlinear Phenom. 1990; 42: 335-346Crossref Scopus (2094) Google Scholar,36.Higgins I

Referência(s)