Revisão Acesso aberto Revisado por pares

Binocular Disparity and the Perception of Depth

1997; Cell Press; Volume: 18; Issue: 3 Linguagem: Inglês

10.1016/s0896-6273(00)81238-6

ISSN

1097-4199

Autores

Ning Qian,

Tópico(s)

Retinal Development and Disorders

Resumo

We perceive the world in three-dimensions even though the input to our visual system, the images projected onto our two retinas, has only two spatial dimensions. How is this accomplished? It is well known that the visual system can infer the third dimension, depth, from a variety of visual cues in the retinal images. One such cue is binocular disparity, the positional difference between the two retinal projections of a given point in space (Figure 1). This positional difference results from the fact that the two eyes are laterally separated and therefore see the world from two slightly different vantage points. The idea that retinal disparity contributes critically to depth perception derives from the invention of the stereoscope by Wheatstone in the 19th century, with which he showed conclusively that the brain uses horizontal disparity to estimate the relative depths of objects in the world with respect to the fixation point, a process known as stereoscopic depth perception or stereopsis. Because simple geometry provides relative depth given retinal disparity, the problem of understanding stereo vision reduces to the question: How does the brain measure disparity from the two retinal images in the first place? Since Wheatstone's discovery, students of vision science have used psychophysical, physiological, and computational methods to unravel the brain's mechanisms of disparity computation. In 1960, Julez made the important contribution that stereo vision does not require monocular depth cues such as shading and perspective (see16Julez B. Foundations of Cyclopean Perception. University of Chicago Press, 1971Google Scholar). This was demonstrated through his invention of random dot stereograms. A typical stereogram consists of two images of randomly distributed dots that are identical except that a central square region of one image is shifted horizontally by a small distance with respect to the other image (see Figure 6a for an example). When each image is viewed individually, it appears as nothing more than a flat field of random dots. However, when the two images are viewed dichoptically (i.e., the left and right images are presented to the left and right eyes, respectively, at the same time), the shifted central square region "jumps" out vividly at a different depth. This finding demonstrates that the brain can compute binocular disparity without much help from other visual modalities.Figure 3Schematic Drawings Illustrate the Left–Right Receptive Field (RF) Shift of Binocular Simple CellsShow full captionThe + and − signs represent the on and off subregions within the receptive fields, respectively. Two different models for achieving the shift have been suggested by physiological experiments.(a) Position-shift model. According to this model, the left and right receptive fields of a simple cell have identical shapes but have an overall horizontal shift between them (Bishop and Pettigrew 1986).(b) Phase-difference model. This model assumes that the shift is between the on–off subregions within the left and right receptive field envelopes that spatially align (29Ohzawa I. DeAngelis G.C. Freeman R.D. Stereoscopic depth discrimination in the visual cortex neurons ideally suited as disparity detectors.Science. 1990; 249: 1037-1041Crossref PubMed Scopus (565) Google Scholar, 30Ohzawa I. DeAngelis G.C. Freeman R.D. Encoding of binocular disparity by simple cells in the cat's visual cortex.J. Neurophysiol. 1996; 75: 1779-1805PubMed Google Scholar). The fovea locations on the left and right retinas are drawn as a reference point for vertically aligning the left and right receptive fields of the simple cell.View Large Image Figure ViewerDownload Hi-res image Download (PPT) The + and − signs represent the on and off subregions within the receptive fields, respectively. Two different models for achieving the shift have been suggested by physiological experiments. (a) Position-shift model. According to this model, the left and right receptive fields of a simple cell have identical shapes but have an overall horizontal shift between them (Bishop and Pettigrew 1986). (b) Phase-difference model. This model assumes that the shift is between the on–off subregions within the left and right receptive field envelopes that spatially align (29Ohzawa I. DeAngelis G.C. Freeman R.D. Stereoscopic depth discrimination in the visual cortex neurons ideally suited as disparity detectors.Science. 1990; 249: 1037-1041Crossref PubMed Scopus (565) Google Scholar, 30Ohzawa I. DeAngelis G.C. Freeman R.D. Encoding of binocular disparity by simple cells in the cat's visual cortex.J. Neurophysiol. 1996; 75: 1779-1805PubMed Google Scholar). The fovea locations on the left and right retinas are drawn as a reference point for vertically aligning the left and right receptive fields of the simple cell. The first direct evidence of disparity coding in the brain was obtained in the late 1960s, when Pettigrew and coworkers recorded disparity selective cells from the striate cortex in the cat, the primary visual area (see3Bishop P.O. Pettigrew J.D. Neural mechanisms of binocular vision.Vision Res. 1986; 26: 1587-1600Crossref PubMed Scopus (63) Google Scholar). The result came as a surprise. Few people at the time expected to find disparity tuned cells so early in the brain's visual processing stream. A decade later, Poggio and collaborators reported similar findings in awake behaving monkeys (see31Poggio G.F. Poggio T. The analysis of stereopsis.Annu. Rev. Neurosci. 1984; 7: 379-412Crossref PubMed Scopus (230) Google Scholar). They classified cells into a few discrete categories, although it now appears that these categories represent idealized cases from a continuous distribution (18LeVay S. Voigt T. Ocular dominance and disparity coding in cat visual cortex.Vis. Neurosci. 1988; 1: 395-414Crossref PubMed Scopus (142) Google Scholar). More recently29Ohzawa I. DeAngelis G.C. Freeman R.D. Stereoscopic depth discrimination in the visual cortex neurons ideally suited as disparity detectors.Science. 1990; 249: 1037-1041Crossref PubMed Scopus (565) Google Scholar, 30Ohzawa I. DeAngelis G.C. Freeman R.D. Encoding of binocular disparity by simple cells in the cat's visual cortex.J. Neurophysiol. 1996; 75: 1779-1805PubMed Google Scholar provided detailed quantitative mapping of binocular receptive fields of the cat visual cortical cells and suggested models for simulating their responses. While these and many other experiments have demonstrated the neural substrates for disparity coding at the earliest stage of binocular convergence, they leave open the question of how a population of disparity selective cells could be used to compute disparity maps from a pair of retinal images such as the stereograms used by Julez. What is needed, in addition to experimental investigations, is a computational theory (21Marr D. Vision. W.H. Freeman and Company, San Francisco1982Google Scholar, 7Churchland P.S. Sejnowski T.J. The Computational Brain. Massachusetts, Cambridge1992Google Scholar) specifying an algorithm for combining the neuronal signals into a meaningful computational scheme. Although there have also been many computational studies of stereo vision in the past, until recently, most studies had treated disparity computation mainly as a mathematics or engineering problem while giving only secondary considerations to existing physiological data. Part of this tradition stems from a belief advanced paradoxically by David Marr, one of the most original thinkers in vision research, that physiological details are not important for understanding information processing tasks (such as visual perception) at the systems level. 21Marr D. Vision. W.H. Freeman and Company, San Francisco1982Google Scholar argued that a real understanding will only come from an abstract computational analysis of how a particular problem may be solved under certain mathematical assumptions, regardless of the neuronal implementations in the brain. Although the importance of Marr's computational concept cannot be overstated, the main problem with ignoring physiology is that there is usually more than one way to "solve" a given perceptual task. Without paying close attention to physiology, one often comes up with algorithms that work in some sense but have little to do with the mechanisms used by the brain. In fact, most previous stereo vision algorithms contain nonphysiological procedures that could not be implemented with real neurons (see34Qian N. Computing stereo disparity and motion with known binocular cell properties.Neural Comp. 1994; 6: 390-404Crossref Google Scholar, for a discussion). To understand visual information processing performed by the brain instead of by an arbitrary machine, one obviously has to construct computational theories of vision based on real neurophysiological data. Such a realistic modeling approach to stereo vision has been proven possible recently. Disparity-tuned units, based on the response properties of real binocular cells, can be shown to effectively compute disparity maps from stereograms. Moreover, the stereo algorithm can be extended to include motion detection and provide coherent explanations for some interesting depth illusions and physiological observations. In the discussion that follows, I attempt to recreate the line of thought that led to these models and use them as examples to discuss the general issue of how models such as these can be useful in interpreting relevant experimental data. As mentioned above, disparity-sensitive cells have been found in the very first stage of binocular convergence, the primary visual cortex. In their classical studies15Hubel D.H. Wiesel T. Receptive fields, binocular interaction, and functional architecture in the cat's visual cortex.J. Physiol. 1962; (160): 106-154PubMed Google Scholar identified two major classes of cells in this area: simple and complex. Simple cells have separate on (excitatory) and off (inhibitory) subregions within their receptive fields that respond to light and dark stimuli, respectively. In contrast, complex cells respond to both types of stimuli throughout their receptive fields. Hubel and Wiesel suggested a hierarchy of anatomical organization according to which complex cells receive inputs from simple cells, which in turn receive inputs from LGN cells. Although the strict validity of this hierarchy is debatable, as some complex cells appear to receive direct LGN inputs, it is generally agreed that the majority of simple and complex cells in the primary visual cortex are binocular (i.e., they have receptive fields on both retinas) and disparity tuned (i.e., they respond differently to different stimulus disparities; see Figure 2). What, then, are the roles of simple and complex cells in disparity computation? Early physiological experiments (3Bishop P.O. Pettigrew J.D. Neural mechanisms of binocular vision.Vision Res. 1986; 26: 1587-1600Crossref PubMed Scopus (63) Google Scholar) suggested that to achieve disparity tuning, a binocular simple cell has an overall shift between its left and right receptive fields as illustrated in Figure 3a. Others (29Ohzawa I. DeAngelis G.C. Freeman R.D. Stereoscopic depth discrimination in the visual cortex neurons ideally suited as disparity detectors.Science. 1990; 249: 1037-1041Crossref PubMed Scopus (565) Google Scholar, 30Ohzawa I. DeAngelis G.C. Freeman R.D. Encoding of binocular disparity by simple cells in the cat's visual cortex.J. Neurophysiol. 1996; 75: 1779-1805PubMed Google Scholar) have suggested that the shift is between the excitatory and inhibitory subregions within the aligned receptive field envelopes (Figure 3b). For simplicity, both of these alternatives will be referred to as "receptive field shift" when it is not essential to distinguish them. These receptive field structures of binocular simple cells could arise from orderly projections of LGN cells with concentric receptive fields, as originally proposed by Hubel and Weisel (Figure 4). Since disparity is nothing but a shift between the two retinal projections (Figure 1), one might expect intuitively that such a simple cell should give the best response when the stimulus disparity matches the cell's left–right receptive field shift. In other words, a simple cell might prefer a disparity equal to its receptive field shift. A population of such cells with different shifts would then prefer different disparities, and the unknown disparity of any stimulus could be computed by identifying which cell gives the strongest response to the stimulus. The reason that no stereo algorithm has come out of these considerations is that the very first assumption–that a binocular simple cell has a preferred disparity equal to its receptive field shift–is not valid. Simple cells cannot have a well-defined preferred disparity because their responses depend not only on the disparity but on the detailed spatial structure of the stimulus (29Ohzawa I. DeAngelis G.C. Freeman R.D. Stereoscopic depth discrimination in the visual cortex neurons ideally suited as disparity detectors.Science. 1990; 249: 1037-1041Crossref PubMed Scopus (565) Google Scholar, 34Qian N. Computing stereo disparity and motion with known binocular cell properties.Neural Comp. 1994; 6: 390-404Crossref Google Scholar, 55Zhu Y. Qian N. Binocular receptive fields, disparity tuning, and characteristic disparity.Neural Comp. 1996; 8: 1647-1677Google Scholar, 37Qian N. Zhu Y. Physiological computation of binocular disparity.Vision Res., in press. 1997; Google Scholar). Although one can measure a disparity tuning curve from a simple cell, the peak location of the curve (i.e., the preferred disparity) changes with some simple manipulations (such as a lateral displacement) of the stimuli. This property is formally known as Fourier phase dependence because the spatial structure of an image is reflected in the phase of its Fourier transform. The Fourier phase dependence of simple cell responses is obviously not desirable from the point of view of extracting a pure disparity signal from which to compute disparity maps. The phase dependence of simple cell responses can be easily understood by considering the disparity tuning of a simple cell to a vertical line. The Fourier phase of the line is directly related to the lateral position of the line, which will affect where its projection falls on the left and right receptive fields of the simple cell. The line with a given disparity may evoke a strong response at one line position because it happens to project onto the excitatory subregions of both the left and right receptive fields but may evoke a much weaker response at a different position because it now stimulates some inhibitory portions(s) of the receptive fields. Therefore, the response of the simple cell to a fixed disparity changes with the changing stimulus Fourier phases, and, consequently, it cannot have a well-defined preferred disparity. There is direct experimental evidence supporting this conclusion. For example29Ohzawa I. DeAngelis G.C. Freeman R.D. Stereoscopic depth discrimination in the visual cortex neurons ideally suited as disparity detectors.Science. 1990; 249: 1037-1041Crossref PubMed Scopus (565) Google Scholar found that disparity tuning curves of simple cells measured with bright bars and dark bars (which have different Fourier phases) are very different. The Fourier phase dependence of simple cell responses can also explain an observation by 32Poggio G.F. Motter B.C. Squatrito S. Trotter Y. Responses of neurons in visual cortex (V1 and V2) of the alert macaque to dynamic random-dot stereograms.Vision Res. 1985; 25: 397-406Crossref PubMed Scopus (164) Google Scholar, who reported that simple cells show no disparity tuning to dynamic random dot stereograms. Each of the stereograms in their experiment maintained a constant disparity over time but varied its Fourier phase from frame to frame by constantly rearranging the dots. Simple cells lost their disparity tuning as a result of averaging over many different (phase-dependent) tuning curves (34Qian N. Computing stereo disparity and motion with known binocular cell properties.Neural Comp. 1994; 6: 390-404Crossref Google Scholar). Although simple cells are not suited for disparity computation, complex cell responses have the desired phase independence as expected from their lack of separate excitatory and inhibitory subregions within their receptive fields (46Skottun B.C. DeValois R.L. Grosof D.H. Movshon J.A. Albrecht D.G. Bonds A.B. Classifying simple and complex cells on the basis of reponse modulation.Vision Res. 1991; 31: 1079-1086Crossref PubMed Scopus (489) Google Scholar). To build a working stereo algorithm, however, one needs to specify how this phase independence is achieved and how an unknown stimulus disparity can be recovered from these responses. Most physiology experiments approach stereo vision from the opposite perspective and measure the responses of visual cells to a set of stimuli with known disparities in order to obtain the cells' disparity tuning curves. These curves alone are not very useful from a computational point of view because a response can be read from a disparity tuning curve only when the stimulus disparity is already known. We need a quantitative procedure for computing an unknown disparity in a pair of retinal images from the responses of complex cells to the images. Fortunately, a method for determining the responses of binocular complex cells has recently been proposed by 29Ohzawa I. DeAngelis G.C. Freeman R.D. Stereoscopic depth discrimination in the visual cortex neurons ideally suited as disparity detectors.Science. 1990; 249: 1037-1041Crossref PubMed Scopus (565) Google Scholar based on their quantitative physiological studies (see also11Ferster D. A comparison of binocular depth mechanisms in areas 17 and 18 of the cat visual cortex.J. Physiol. 1981; 311: 623-655PubMed Google Scholar). These investigators found that a binocular complex cell in the cat primary visual cortex can be simulated by summing up the squared responses of a quadrature pair of simple cells, and the simple cell responses, in turn, can be simulated by adding the visual inputs on their left and right receptive fields (see Figure 5). (Two binocular simple cells are said to form a quadrature pair if there is a quarter-cycle shift between the excitatory–inhibitory subregions for both their left and right receptive fields; Ohzawa et al., 1990.) The remaining questions are whether the model complex cells constructed this way are indeed independent of stimulus Fourier phases and if so, how their preferred disparities are related to their receptive field parameters. These issues have recently been investigated through mathematical analyses and computer simulations (34Qian N. Computing stereo disparity and motion with known binocular cell properties.Neural Comp. 1994; 6: 390-404Crossref Google Scholar, 55Zhu Y. Qian N. Binocular receptive fields, disparity tuning, and characteristic disparity.Neural Comp. 1996; 8: 1647-1677Google Scholar, 37Qian N. Zhu Y. Physiological computation of binocular disparity.Vision Res., in press. 1997; Google Scholar). The complex cell model was found to be independent of stimulus Fourier phases for some types of stimuli, including the bars used in the physiological experiments of 29Ohzawa I. DeAngelis G.C. Freeman R.D. Stereoscopic depth discrimination in the visual cortex neurons ideally suited as disparity detectors.Science. 1990; 249: 1037-1041Crossref PubMed Scopus (565) Google Scholar, and its preferred disparity is equal to the left–right receptive field shift within the constituent simple cells. For more complicated stimuli, such as random dot stereograms, however, the complex cell constructed from a single quadrature pair of simple cells is still phase sensitive, although less so than simple cells. This problem can be easily solved by considering the additional physiological fact that complex cells have somewhat larger receptive fields than those of simple cells (15Hubel D.H. Wiesel T. Receptive fields, binocular interaction, and functional architecture in the cat's visual cortex.J. Physiol. 1962; (160): 106-154PubMed Google Scholar). This fact is incorporated into the model by averaging over several quadrature pairs of simple cells with nearby and overlapping receptive fields to construct a model complex cell (55Zhu Y. Qian N. Binocular receptive fields, disparity tuning, and characteristic disparity.Neural Comp. 1996; 8: 1647-1677Google Scholar, 37Qian N. Zhu Y. Physiological computation of binocular disparity.Vision Res., in press. 1997; Google Scholar). The resulting complex cell is largely phase independent for any stimulus, and its preferred disparity still equals the receptive field shift within the constituent simple cells. With the above method for constructing model complex cells with well-defined preferred disparities, we are finally ready to develop a stereo algorithm for computing disparity maps from stereograms. By using a population of complex cells with preferred disparities covering the range of interest, the disparity of any input stimulus can be determined by identifying the cell in the population with the strongest response (or by calculating the population averaged preferred disparity of all cells weighted by their responses). An example of applying this algorithm to a random dot stereogram is shown in Figure 6. The result demonstrates that a population of complex cells can effectively compute the disparity map of the stereogram via a distributed representation. There is, as yet, no direct anatomical evidence supporting the quadrature pair method for constructing binocular complex cells from simple cells. However, based on the quantitative physiological work of 29Ohzawa I. DeAngelis G.C. Freeman R.D. Stereoscopic depth discrimination in the visual cortex neurons ideally suited as disparity detectors.Science. 1990; 249: 1037-1041Crossref PubMed Scopus (565) Google Scholar, the method is at least valid as a phenomenological description for a subset of real complex cell responses. In addition, the analyses indicate that the same phase-independent complex cell responses can be obtained by appropriately combining the outputs of many simple or LGN cells without requiring the specific quadrature relationship (34Qian N. Computing stereo disparity and motion with known binocular cell properties.Neural Comp. 1994; 6: 390-404Crossref Google Scholar, 36Qian N. Andersen R.A. A physiological model for motion–stereo integration and a unified explanation of the Pulfrich-like phenomena.Vision Res., in press. 1997; Google Scholar). To measure binocular disparity, the visual system must solve the correspondence problem: it must determine which parts on the two retinal images come from the same object in the world. How does the above stereo algorithm address the correspondence problem? Historically, it has been suggested that the visual system solves the problem by matching up image features between the two retinas. In the case of random dot stereograms, the correspondence problem is often stated as identifying which dot in the left image matches which dot in the right image. Since all dots in the two images are of identical shape, it is often argued that any two dots could match, and the visual system is faced with an enormously difficult problem of sorting out the true matches from the huge number of false ones. This argument is the starting point for a whole class of stereo algorithms (22Marr D. Poggio T. Cooperative computation of stereo disparity.Science. 1976; 194: 283-287Crossref PubMed Scopus (978) Google Scholar, 33Prazdny K. Detection of binocular disparities.Biol. Cybern. 1985; 52: 93-99Crossref PubMed Scopus (178) Google Scholar, 35Qian N. Sejnowski T.J. Learning to solve random-dot stereograms of dense and transparent surfaces with recurrent backpropagation. 1989Google Scholar, 23Marshall J.A. Kalarickal G.J. Graves E.B. Neural model of visual stereomatching slant, transparency, and clouds.Network. 1996; Google Scholar). It is not physiological, however, because the left and right receptive fields of a typical binocular cell can be much larger than a dot in a stereogram. Even the cells in monkey striate cortex that represents the fovea, the area of greatest visual acuity and smallest receptive fields, have a receptive field size of about 0.1 degree (9Dow B.M. Snyder A.Z. Vautin R.G. Bauer R. Magnification factor and receptive field size in foveal striate cortex of the monkey.Exp. Brain Res. 1981; 44: 213-228Crossref PubMed Scopus (357) Google Scholar). This dimension is more than twice as large as a dot in the stereogram in Figure 6 when viewed at a distance of >35 cm. A closely related fact is that most cells are broadly tuned to disparity; even the most sharply tuned cells have tuning widths of about 0.1–0.2 degree (31Poggio G.F. Poggio T. The analysis of stereopsis.Annu. Rev. Neurosci. 1984; 7: 379-412Crossref PubMed Scopus (230) Google Scholar, 17Lehky S.R. Sejnowski T.J. Neural model of stereoacuity and depth interpolation based on a distributed representation of stereo disparity.J. Neurosci. 1990; 10: 2281-2299PubMed Google Scholar). It is therefore difficult to imagine how a cell could match a specific pair of dots while ignoring many others in its receptive fields. It appears more reasonable to assume that a binocular cell tries to match the two image patches covered by its receptive fields–each may contain two or more dots–instead of operating on fine image features such as individual dots. Since each image patch is likely to contain a unique dot distribution, it can be best matched by only one (corresponding) patch in the other image. Therefore, for an algorithm that avoids operating at the level of individual dots, the false-match problem is practically nonexistent (43Sanger T.D. Stereo disparity computation using gabor filters.Biol. Cybern. 1988; 59: 405-418Crossref Scopus (206) Google Scholar). The stereo model of the previous section demonstrates that binocular complex cells as described by 29Ohzawa I. DeAngelis G.C. Freeman R.D. Stereoscopic depth discrimination in the visual cortex neurons ideally suited as disparity detectors.Science. 1990; 249: 1037-1041Crossref PubMed Scopus (565) Google Scholar have the right physiological property for matching the image patches in its receptive fields. A careful mathematical analysis of the model complex cells reveals that their computation is formally equivalent to summing two related cross-products of the band-pass-filtered left and right image patches (37Qian N. Zhu Y. Physiological computation of binocular disparity.Vision Res., in press. 1997; Google Scholar). This operation is related to cross-correlation, but it overcomes some major problems with the standard cross-correlator. A good model should explain more than it is originally designed for. To evaluate the stereo model above, it has been applied to other problems of stereopsis. For example, the model has been used to explain the observation that we can still perceive depth when the contrasts of the two images in a stereogram are very different, so long as they have the same sign (34Qian N. Computing stereo disparity and motion with known binocular cell properties.Neural Comp. 1994; 6: 390-404Crossref Google Scholar). Recently, the model has been applied to a psychophysically observed depth illusion. In 1986, Westheimer first described that when a few isolated features are viewed on the fovea, the perceived depth of a given feature depends not only on its own disparity but on the disparity of neighboring features. Specifically, two vertical line segments at different disparities, separated laterally along the horizontal frontoparallel direction, influence each other's perceived depth in the following way: when the lateral distance between the two lines is small (<∼5 min), the two lines appear closer in depth as if they are attracting each other. At larger distances, this effect reverses, and the two lines appear further away from each other in depth (repulsion). When the distance is very large, there is no interaction between the lines. To model these effects, the responses of a population of complex cells centered on one line were examined as a function of how they are influenced by the presence of the other line at various lateral distances (37Qian N. Zhu Y. Physiological computation of binocular disparity.Vision Res., in press. 1997; Google ScholarARVO, abstract). The interaction between the lines in the model originates from the lines' simultaneous presence in the cells' receptive fields, and this can naturally explain Westheimer's observation without introducing any ad hoc assumptions (Figure 7). Thus, the psychophysically observed disparity attraction–repulsion phenomenon may be viewed as a direct consequence of the known physiological properties of binocular cells in the visual cortex. The stereo model also helped interpret a recent physiological observation by 51Wagner H. Frost B. Disparity-sensitive cells in the owl have a characteristic disparity.Nature. 1993; 364: 796-798Crossref PubMed Scopus (59) Google Scholar. Recording from the visual Wulst of the barn owl, Wagner and Frost found that for some cells, the peak locations of a cell's disparity tuning curves to spatial noise patterns and sinusoidal gratings of various frequencies approximately coincide at a certain disparity. They called this disparity the characteristic disparity (CD) of the cell. 51Wagner H. Frost B. Disparity-sensitive cells in the owl have a characteristic disparity.Nature. 1993; 364: 796-798Crossref PubMed Scopus (59) Google

Referência(s)
Altmetric
PlumX