Realistic Modeling of Simple and Complex Cell Tuning in the HMAXModel, and Implications for Invariant Object Recognition in Cortex

Unknown author (2004-07-27)

Riesenhuber \& Poggio recently proposed a model of object recognitionin cortex which, beyond integrating general beliefs about the visualsystem in a quantitative framework, made testable predictions aboutvisual processing. In particular, they showed that invariant objectrepresentation could be obtained with a selective pooling mechanismover properly chosen afferents through a {\sc max} operation: Forinstance, at the complex cells level, pooling over a group of simplecells at the same preferred orientation and position in space but atslightly different spatial frequency would provide scale tolerance,while pooling over a group of simple cells at the same preferredorientation and spatial frequency but at slightly different positionin space would provide position tolerance. Indirect support for suchmechanisms in the visual system come from the ability of thearchitecture at the top level to replicate shape tuning as well asshift and size invariance properties of ``view-tuned cells'' (VTUs)found in inferotemporal cortex (IT), the highest area in the ventralvisual stream, thought to be crucial in mediating object recognitionin cortex. There is also now good physiological evidence that a {\scmax} operation is performed at various levels along the ventralstream. However, in the original paper by Riesenhuber \& Poggio,tuning and pooling parameters of model units in early and intermediateareas were only qualitatively inspired by physiological data. Inparticular, many studies have investigated the tuning properties ofsimple and complex cells in primary visual cortex, V1. We show thatunits in the early levels of HMAX can be tuned to produce realisticsimple and complex cell-like tuning, and that the earlier findings onthe invariance properties of model VTUs still hold in this morerealistic version of the model.