Control, Multiple Description, and Purpose in the Visual Perception of Complex Scenes: A Pogress Report

Unknown author (1975-08)

This report describes research done at the Artificial Intelligence Laboratory of the Massachusetts Institute of Technology. Support for the laboratory's artificial intelligence research is provided in part by the Advanced Research Projects Agency of the Department of Defense under Office of Naval Research contract N00014-75-C-0643.

Working Paper

This memo describes a vision program for recognizing simple furniture comprising assemblies of blocks, in which the same item may be composed in diverse ways. As such, it is concerned with three theoretical issues, perceptual processing, supression of unwanted detail, and segregation and interconnection of information. The program's perceptual processing relies on an elaborate, redundant, alterable model of the scene rather than on any clever process structure. This approach aids the interpretation of incomplete, ambiguous portions of the scene as well as simplifies the program. The model is capable of quantitative as well as qualitative alteration, by a constraint-propogation system and a system of frame-shift demons. The hierarchical nature of the scene - assemblies of assemblies of blocks - is reflected as hierarchy in the model. Each assembly is represented as having an external aspect, by which it relates to surrounding assemblies, and an internal aspect, listing the parts and relationships composing it. This imposes a natural supression of detail. In addition to the vertical layering of the model there are horizontal subdivisions adapted for different computational purposes. There is a 2D section representing the image, a 3D section representing the shape, and a stability section representing the physical forces and moments acting upon each unit. Each of the sections can be used through any of several indirect reference frames corresponding to different spatial viewpoints. Many computations on the model, such as stability analysis, spatial relationships, and visual matching, are greatly simplified by first selecting the proper spatial viewpoints.