Artigo Revisado por pares

One shot learning of simple visual concepts

2011; Wiley; Volume: 33; Issue: 33 Linguagem: Inglês

ISSN

1551-6709

Autores

Brenden M. Lake, Ruslan Salakhutdinov, Jason N. Gross, Joshua B. Tenenbaum,

Tópico(s)

Advanced Image and Video Retrieval Techniques

Resumo

One shot learning of simple visual concepts Brenden M. Lake, Ruslan Salakhutdinov, Jason Gross, and Joshua B. Tenenbaum Department of Brain and Cognitive Sciences Massachusetts Institute of Technology Abstract People can learn visual concepts from just one example, but it remains a mystery how this is accomplished. Many authors have proposed that transferred knowledge from more familiar concepts is a route to one shot learning, but what is the form of this abstract knowledge? One hypothesis is that the shar- ing of parts is core to one shot learning, and we evaluate this idea in the domain of handwritten characters, using a massive new dataset. These simple visual concepts have a rich inter- nal part structure, yet they are particularly tractable for com- putational models. We introduce a generative model of how characters are composed from strokes, where knowledge from previous characters helps to infer the latent strokes in novel characters. The stroke model outperforms a competing state- of-the-art character model on a challenging one shot learning task, and it provides a good fit to human perceptual data. Keywords: category learning; transfer learning; Bayesian modeling; neural networks Figure 1: Test yourself on one shot learning. From the example boxed in red, can you find the others in the array? On the left is a Segway and on the right is the first character of the Bengali alphabet. Answer for the Bengali character: Row 2, Column 3; Row 4, Column 2. A hallmark of human cognition is learning from just a few examples. For instance, a person only needs to see one Seg- way to acquire the concept and be able to discriminate future Segways from other vehicles like scooters and unicycles (Fig. 1 left). Similarly, children can acquire a new word from one encounter (Carey & Bartlett, 1978). How is one shot learning possible? New concepts are almost never learned in a vacuum. Past experience with other concepts in a domain can support the rapid learning of novel concepts, by showing the learner what matters for generalization. Many authors have suggested this as a route to one shot learning: transfer of abstract knowledge from old to new concepts, often called transfer learning, rep- resentation learning, or learning to learn. But what is the nature of the learned abstract knowledge that lets humans ac- quire new object concepts so quickly? The most straightforward proposals invoke attentional learning (Smith, Jones, Landau, Gershkoff-Stowe, & Samuel- son, 2002) or overhypotheses (Kemp, Perfors, & Tenenbaum, 2007; Dewar & Xu, in press), like the shape bias in word learning. Prior experience with concepts that are clearly orga- nized along one dimension (e.g., shape, as opposed to color or material) draws a learner’s attention to that same dimension (Smith et al., 2002) – or increases the prior probability of new concepts concentrating on that same dimension (Kemp et al., 2007). But this approach is limited since it requires that the relevant dimensions of similarity be defined in advance. For many real-world concepts, the relevant dimensions of similarity may be constructed in the course of learning to learn. For instance, when we first see a Segway, we may parse it into a structure of familiar parts arranged in a novel configuration: it has two wheels, connected by a platform, supporting a motor and a central post at the top of which are two handlebars. These parts and their relations comprise a Figure 2: Examples from a new 1600 character database. useful representational basis for many different vehicle and artifact concepts – a representation that is likely learned in the course of learning the concepts that they support. Several papers from the recent machine learning and computer vision literature argue for such an approach: joint learning of many concepts and a high-level part vocabulary that underlies those concepts (e.g., Torralba, Murphy, & Freeman, 2007; Fei-Fei, Fergus, & Perona, 2006). Another recently popular machine learning approach is based on deep learning (Salakhutdinov & Hinton, 2009): unsupervised learning of hierarchies of dis- tributed feature representations in neural-network-style prob- abilistic generative models. These models do not specify ex- plicit parts and structural relations, but they can still construct meaningful representations of what makes two objects deeply similar that go substantially beyond low-level image features. These approaches from machine learning may be com- pelling ways to understand how humans learn so quickly, but there is little experimental evidence that directly supports them. Models that construct parts or features from sensory data (pixels) while learning object concepts have been tested in elegant behavioral experiments with very simple stimuli and a very small number of concepts (Austerweil & Griffiths, 2009; Schyns, Goldstone, & Thibaut, 1998). But there have been few systematic comparisons of multiple state-of-the-art computational approaches to representation learning with hu-

Referência(s)