Visual Perception Laboratory


Research Topics

    Figure-ground organization (FGO)

    Using a binocular camera, our robot solves both 3D FGO and 2D FGO. 3D FGO refers to detecting the floor and walls and estimating positions, sizes and orientation of objects on the floor. This representation includes information about where objects end on their back invisible sides, as well as about invisible spaces behind objects. This is accomplished based on only one pair of images acquired from a single vantage point. 2D FGO refers to finding regions in the 2D images representing individual objects. go here to see examples of 3D and 2D FGO of a static scene. For the solution of FGO of a dynamic scene go here.

    Recovering 3D scenes

    We recently demonstrated that perception of 3D natural scenes is veridical. This is true with both monocular and binocular viewing. So, the visual space is Euclidean with hardly any systematic errors. This agrees with our common sense and finally straightens out 100 years of research based on very impoverished stimuli leading to results that never generalized outside of the laboratory conditions that were used to produce these results in the first place. If our 3D vision was distorted, we wouldn't be able to walk around, drive and build things together. Note that 3D veridical vision is computationally very difficult. It is an ill-posed inverse problem whose solution critically depends on the operation of effective a priori constraints such as symmetry and three-dimensionality of objects, direction of gravity, orientation of the ground surface. If the viewing conditions are impoverished, constraints cannot be applied and vision is unreliable and biased. But this is a result of the mathematics of the problem, not of the perceptual mechanisms. The claim that perception is not veridical has been the biggest blunder in visual science to date, in a science with quite a few faults. To see an example of how Čapek, our robot, recovers a 3D scene go here.

    Robot Navigation

    Recent advances in our 3D shape recovery, 3D scene recovery and our treatment of the Traveling Salesman Problem (TSP) have made it possible to propose a new approach in robotics for the study of visually-based navigation. Our robot recovers a spatially-global map of the 3D scene, including the back, invisible parts of objects and invisible spaces among objects. Having spatially global map produced from a single viewing point is essential to plan and execute navigation within dynamic environments. An example of the robot's blind navigation is shown here. Check the news release by Purdue Research Foundation.

    Shape Perception

    Shape is arguably the most important characteristic of objects because it has sufficient complexity to allow unambiguous identification of objects. Shape, and only shape, can be characterized by very many parameters (theoretically, infinitely many). Other visual properties such as color, lightness, size or speed, can be characterized by three at most. The complexity of the property of objects called, "shape", makes it possible to recognize and recover them without using any information about the context in which they appear. Neither memory nor depth perception is required to do this. Shape constancy is unlike all other perceptual constancies in that it is achieved by using a priori simplicity constraints (aka "priors") such as symmetry and 3D compactness, rather than by taking contextual properties into account. Shape constancy is a critical perceptual property because it allows our percepts to be "veridical". By veridical, we mean that we see 3D shapes the way they are "out there". Note that entire 3D shapes can be recovered, including the back invisible parts, as well as the front, visible ones.

    See demos of shape recovery

    Contribution of stereoacuity to 3D shape recovery

    Stereoacuity refers to the binocular ability to judge the depth order of features. Stereoacuity is a hyperacuity which, in technical jargon, means subpixel resolution. Stereoacuity has never been used in theories of shape perception because it does not allow reconstructing depth intervals. So, all previous theories of binocular shape perception used ordinary binocular disparity. It turns out that when stereoacuity is combined with symmetry a priori constraint, the recovery result is absolutely perfect. How this works is illustrated in this animation.

    Symmetry and skewed symmetry

    Most natural objects are symmetrical: animals are symmetrical because of the way they move, plants are symmetrical because of the way they grow, and man-made objects are symmetrical because of the functions they serve. Once the utility and omnipresence of symmetry is appreciated, one should expect symmetry to be used by visual systems (both human and computer) as an important a priori constraint (an assumption) designed to allow them to produce accurate perceptual interpretations of the 3D shapes of objects in their natural environment. Using symmetry effectively for this purpose is complicated by the fact that the 2D retinal image of a symmetrical 3D object is always asymmetrical, but note that the symmetry of the object is only distorted in its 2D image. It is not destroyed. We have been able to show that the human visual system is able to detect the distorted (skewed) symmetry inherent in a 2D retinal image and then use this information to recover the shape of the symmetrical 3D object. Several examples of automatic recovery from real images can be seen here.

    Note, however, that 3D symmetry is not sufficient for reliable recovery: it turns out that any 2D retinal image has 3D symmetrical interpretations. Here are example 1, and example 2. For 3D symmetry to be fully effective, additional constraints, such as planarity, must be used as well - see example 3.

    Problem Solving

    Problem solving is one of the human beings fundamental cognitive abilities. It is at least as important as the other more commonly-studied, mental activities, namely, perception, memory, decision making and learning. We approach problem solving by adopting an information-processing methodology and use it to study computationally difficult (intractable) problems that can be presented to the subject visually, for example, the Traveling Salesman Problem. Human subjects produce near-optimal solutions to such combinatorial optimization problems in linear time. A hierarchical (pyramid) algorithm is the only model that can emulate human performance. It performs fine-to-coarse or coarse-to-fine hierarchical clustering of states (cities) and then produces a solution tour by using a sequence of successive approximations in a coarse-to-fiine direction. The model emulates non-uniform distribution of receptors in the human retina, as well as eye-movements that move the model's attention. See a demo that shows how the model solves 50-city TSP.

    Recently, we modified the model so that its working memory can store only a few pieces of information at a time. This modification did not reduce the quality or the speed of the solution. Four demos illustrate how the model's visual representation zooms-out and zooms-in during the process of analyzing spatially global and spatially local features of the problem.

    demo 2

    demo 3

    demo 4

    demo 5

    Phi Phenomenon

    In 1912, Max Wertheimer (1880-1943), the founder of the Gestalt School of Psychology, published a monograph on the perception of apparent motion that profoundly influenced subsequent perceptual research and theory. Wertheimer's contribution was inspired by his serendipitous observation of what he called a "pure" apparent movement. It was pure in the sense that the motion was not associated with perceiving any object changing its location in space. He called this pure motion the "phi-phenomenon" to distinguish it from "optimal" apparent movement (called "beta"). In the demo you can see beta and "magniphi" which is our vivid version of Wertheimer's phi.


Yll Haxhimusa. Created: March 28, 2008; Last change: April 11th, 2008 | Disclaimer & Copyright Notice |