Discovering object categories in image collections
Given a set of images containing multiple object categories,we seek to discover those categories and their image locations withoutsupervision. We achieve this using generative modelsfrom the statistical text literature: probabilistic Latent SemanticAnalysis (pLSA), and Latent Dirichlet Allocation (LDA). In text analysisthese are used to discover topics in a corpus using the bag-of-wordsdocument representation. Here we discover topics as object categories, sothat an image containing instances of several categories is modelled as amixture of topics.The models are applied to images by using avisual analogue of a word, formed by vector quantizing SIFT like regiondescriptors. We investigate a set of increasingly demanding scenarios,starting with image sets containing only two object categories through tosets containing multiple categories (including airplanes, cars, faces,motorbikes, spotted cats) and background clutter. The object categoriessample both intra-class and scale variation, and both the categories andtheir approximate spatial layout are found without supervision.We also demonstrate classification of unseen images and images containingmultiple objects. Performance of the proposed unsupervised method is compared tothe semi-supervised approach of Fergus et al.