Artigo Revisado por pares

Canonical views of scenes depend on the shape of the space

2011; Wiley; Volume: 33; Issue: 33 Linguagem: Inglês

ISSN

1551-6709

Autores

Krista A. Ehinger, Aude Oliva,

Tópico(s)

Spatial Cognition and Navigation

Resumo

Canonical views of scenes depend on the shape of the space Krista A. Ehinger (kehinger@mit.edu) Department of Brain & Cognitive Sciences, MIT, 77 Massachusetts Ave. Cambridge, MA 02139 USA Aude Oliva (oliva@mit.edu) Department of Brain & Cognitive Sciences, MIT, 77 Massachusetts Ave. Cambridge, MA 02139 USA Abstract When recognizing or depicting objects, people show a preference for particular “canonical” views. Are there similar preferences for particular views of scenes? We investigated this question using panoramic images, which show a 360-degree view of a location. Observers used an interactive viewer to explore the scene and select the best view. We found that agreement between observers on the “best” view of each scene was generally high. We attempted to predict the selected views using a model based on the shape of the space around the camera location and on the navigational constraints of the scene. The model performance suggests that observers select views which capture as much of the surrounding space as possible, but do not consider navigational constraints when selecting views. These results seem analogous to findings with objects, which suggest that canonical views maximize the visible surfaces of an object, but are not necessarily functional views. Keywords: canonical view; scene perception; panoramic scenes. Introduction Although people can recognize familiar objects in any orientation, there seem to be preferred or standard views for recognizing and depicting objects. These preferred views, called “canonical” views, are the views that observers select as best when they are shown various views of an object, and these are the views that people usually produce when they are asked to photograph or form a mental image an object (Palmer, Rosch, & Chase, 1981). In general, the canonical view of an object is a view which maximizes the amount of visible object surface. The canonical view varies across objects and seems to depend largely on the shape of the object. For most three- dimensional objects (e.g., a shoe or an airplane), observers prefer a three-quarters view which shows three sides of the object (such as the front, top, and side). However, straight- on views may be preferred for flatter objects like forks, clocks, and saws, presumably because the front of the object contains the most surface area and conveys the most information about object identity (Verfaillie & Boutsen, 1995). In addition, observers avoid views in which an object is partly occluded by its parts, and they avoid accidental views which make parts of the object difficult to see (Blanz, Tarr, & Bulthoff, 1999). Canonical views of objects may also reflect the ways people interact with objects. People show some preferences for elevated views of smaller objects, but ground-level views of larger objects (Verfaillie & Boutsen, 1995). The ground-level views show less of the object (because they omit the top plane), but seem to be more canonical for large objects such as trucks or trains because these objects are rarely seen from above. However, these sorts of preferences may be due to greater familiarity with certain views, not functional constraints per se. Observers do not consistently select views in which an object is oriented for grasping (e.g., a teapot with the handle towards the viewer), and when subjects do choose these views, they don’t match the handle’s left/right orientation to their dominant hand (Blanz, Tarr, & Bulthoff, 1999). Scenes and places, like objects, are three-dimensional entities that are experienced and recognized from a variety of angles. Therefore, it seems reasonable to expect that certain views of a scene are more informative and would be preferred over others. However, this has not been well studied. Studies using artificial scenes (a collection of objects on a surface) have shown that scene learning is viewpoint dependent, but recognition is fastest not just for learned views, but also for standardized or interpolated versions of the learned views (Diwadkar & McNamara, 1997; Waller, 2006; Waller, et al., 2009). For example, after learning an off-center view of a scene, viewers recognize the centered view of the scene about as quickly as the learned view. There is also some evidence that there are “best” views of real-world places. Studies of large photo databases have shown that different photographers tend to select the same views when taking photos in the same location, suggesting that there is good agreement on the “best” views of these scenes (Simon, Snavely, and Seitz, 2007). Clustering analyses of the photographs can produce a set of representative views which are highly characteristic and recognizable, but it is not clear that these are the “canonical” views in the sense of Palmer, Rosch, and Chase (1981). For example, the most commonly photographed view in a particular cathedral could be a close-up view of a famous statue in the cathedral – but this view would probably not be considered the “best” view of the cathedral, nor would it be

Referência(s)