Representing Visual Information
Vision is the construction of efficient symbolic descriptions from images of the world. An important aspect of vision is the choice of representations for the different kinds of information in a visual scene. In the early stages of the analysis of an image, the representations used depend more on what it is possible to compute from an image than on what is ultimately desirable, but later representations can be more sensitive to the specific needs of recognition. This essay surveys recent work in vision at M.I.T. from a perspective in which the representational problems assume a primary importance. An overall framework is suggested for visual information processing, in which the analysis proceeds through three representations; (1) the primal sketch, which makes explicit the intensity changes and local two-dimensional geometry of an image (2) the 2 1/2-D sketch, which is a viewer-centered representation of the depth, orientation and discontinuities of the visible surfaces, and (3) the 3-D model representation, which allows an object-centered description of the three-dimensional structure and organization of a viewed shape. Recent results concerning processes for constructing and maintaining these representations are summarized and discussed.