MTECH PROJECTS
Learning Hierarchical Space Tiling for Scene Modeling, Parsing and Attribute Tagging A typical scene category contains an enormous number of distinct scene configurations that are composed of objects and regions of varying shapes in different layouts. In this paper, we first propose a representation named Hierarchical Space Tiling (HST) to quantize the huge and continuous scene configuration space. Then, we augment the HST with attributes (nouns and adjectives) to describe the semantics of the objects and regions inside a scene. We present a weakly supervised method for simultaneously learning the scene configurations and attributes from a collection of natural images associated with descriptive text. The precise locations of attributes are unknown in the input and are mapped to the HST nodes through learning. Starting with a full HST, we iteratively estimate the HST model under a learning-by-parsing framework. Given a test image, we compute the most probable parse tree with the associated attributes by dynamic programming. We quantitatively analyze the representative efficiency of HST, show the learned representation is less ambiguous and has semantically meaningful inner concepts. In applications, we apply our model to four tasks: scene classification, attribute recognition, attribute localization, and pixel-wise scene labeling, and show the performance improvements as well as higher efficiency.