Figure-ground organization (FGO)
Using a binocular camera, our robot solves both 3D FGO and 2D FGO. 3D FGO refers to detecting the floor and walls and estimating positions, sizes and orientation
of objects on the floor. This representation includes information about where objects end on their back invisible sides, as well as about invisible spaces behind objects.
This is accomplished based on only one pair of images acquired from a single vantage point. 2D FGO refers to finding regions in the 2D images representing individual objects.
go here to see examples of 3D and 2D FGO of a static scene. For the solution of FGO of a dynamic scene go
here.
Recovering 3D scenes
We recently demonstrated that perception of 3D natural scenes is veridical. This is true with both monocular and binocular viewing. So, the visual space is Euclidean
with hardly any systematic errors. This agrees with our common sense and finally straightens out 100 years of research based on very impoverished stimuli leading to
results that never generalized outside of the laboratory conditions that were used to produce these results in the first place. If our 3D vision was distorted, we wouldn't be able to
walk around, drive and build things together. Note that 3D veridical vision is computationally very difficult. It is an ill-posed inverse problem whose solution critically
depends on the operation of effective a priori constraints such as symmetry and three-dimensionality of objects, direction of gravity, orientation of the ground surface.
If the viewing conditions are impoverished, constraints cannot be applied and vision is unreliable and biased. But this is a result of the mathematics of the problem, not
of the perceptual mechanisms. The claim that perception is not veridical has been the biggest blunder in visual science to date, in a science with quite a few faults.
To see an example of how Čapek, our robot, recovers a 3D scene go here.
Robot Navigation
Recent advances in our 3D shape recovery, 3D scene recovery and our treatment of the
Traveling Salesman Problem (TSP) have made it possible to propose a new approach in robotics for the study of visually-based
navigation. Our robot recovers a spatially-global map of the 3D scene, including the back, invisible parts of objects and invisible spaces among objects.
Having spatially global map produced from a single viewing point is essential to plan and execute navigation within dynamic environments. An example of
the robot's blind navigation is shown here. Check the news release by
Purdue Research Foundation.
Shape Perception
Shape is arguably the most important characteristic of objects because it has sufficient complexity to allow unambiguous
identification of objects. Shape, and only shape, can be characterized by very many parameters (theoretically, infinitely
many). Other visual properties such as color, lightness, size or speed, can be characterized by three at most. The complexity
of the property of objects called, "shape", makes it possible to recognize and recover them without using any information
about the context in which they appear. Neither memory nor depth perception is required to do this. Shape
constancy is unlike all other perceptual constancies in that it is achieved by using a priori simplicity constraints
(aka "priors") such as symmetry and 3D compactness, rather than by taking contextual properties into account. Shape constancy
is a critical perceptual property because it allows our percepts to be "veridical". By veridical, we mean that we see 3D shapes
the way they are "out there". Note that entire 3D shapes can be recovered, including the back invisible parts, as well as the front, visible ones.
See demos of shape recovery
Contribution of stereoacuity to 3D shape recovery
Stereoacuity refers to the binocular ability to judge the depth order of features. Stereoacuity is a hyperacuity which, in technical jargon, means
subpixel resolution. Stereoacuity has never been used in theories of shape perception because it does not allow reconstructing depth intervals.
So, all previous theories of binocular shape perception used ordinary binocular disparity. It turns out that when stereoacuity is combined with
symmetry a priori constraint, the recovery result is absolutely perfect. How this works is illustrated in this
animation.
Symmetry and skewed symmetry
Most natural objects are symmetrical: animals are symmetrical because of the way they move, plants are symmetrical because of the way they grow,
and man-made objects are symmetrical because of the functions they serve. Once the
utility and omnipresence of symmetry is appreciated, one should expect symmetry to be used by visual systems (both human and computer)
as an important a priori constraint (an assumption) designed to allow them to produce accurate perceptual interpretations of the 3D shapes
of objects in their natural environment. Using symmetry effectively for this purpose is complicated by the fact that the 2D retinal image
of a symmetrical 3D object is always asymmetrical, but note that the symmetry of the object is only distorted in its 2D image. It is not
destroyed. We have been able to show that the human visual system is able to detect the distorted (skewed) symmetry inherent in a 2D retinal
image and then use this information to recover the shape of the symmetrical 3D object. Several examples of automatic recovery from real images
can be seen here.
Note, however, that 3D symmetry is not sufficient for reliable recovery: it turns out that any 2D retinal image has 3D symmetrical interpretations.
Here are example 1, and example 2. For 3D symmetry
to be fully effective, additional constraints, such as planarity, must be used as well - see example 3.
Problem Solving
Problem solving is one of the human beings fundamental cognitive abilities. It is at least as important as the other
more commonly-studied, mental activities, namely, perception, memory, decision making and learning. We approach problem solving by adopting
an information-processing methodology and use it to study computationally difficult (intractable) problems that can be
presented to the subject visually, for example, the Traveling Salesman Problem. Human subjects produce near-optimal solutions
to such combinatorial optimization problems in linear time. A hierarchical (pyramid) algorithm is the only model that can emulate human performance.
It performs fine-to-coarse or coarse-to-fine hierarchical clustering of states (cities) and then produces a solution tour by using a sequence of
successive approximations in a coarse-to-fiine direction. The model emulates non-uniform distribution of receptors in the human retina, as well as
eye-movements that move the model's attention. See a demo that shows how the model solves 50-city TSP.
Recently, we modified the model so that its working memory can store only a few pieces of information at a time. This modification did not
reduce the quality or the speed of the solution. Four demos illustrate how the model's visual representation zooms-out and zooms-in during the process of analyzing
spatially global and spatially local features of the problem.
demo 2
demo 3
demo 4
demo 5
Phi Phenomenon
In 1912, Max Wertheimer (1880-1943), the founder of the Gestalt School of Psychology, published a monograph on the perception of apparent motion
that profoundly influenced subsequent perceptual research and theory. Wertheimer's contribution was inspired by his serendipitous observation of
what he called a "pure" apparent movement. It was pure in the sense that the motion was not associated with perceiving any object changing its
location in space. He called this pure motion the "phi-phenomenon" to distinguish it from "optimal" apparent movement (called "beta"). In the
demo you can see beta and "magniphi" which is our vivid version of Wertheimer's phi.