Use this URL to cite or link to this record in EThOS:
Title: Zero-shot image classification
Author: Long, Yang
ISNI:       0000 0004 6424 0708
Awarding Body: University of Sheffield
Current Institution: University of Sheffield
Date of Award: 2017
Availability of Full Text:
Access from EThOS:
Access from Institution:
Image classification is one of the essential tasks for the intelligent visual system. Conventional image classification techniques rely on a large number of labelled images for supervised learning, which requires expensive human annotations. Towards real intelligent systems, a more favourable way is to teach the machine how to make classification using prior knowledge like humans. For example, a palaeontologist could recognise an extinct species purely based on the textual descriptions. To this end, Zero-Shot Image Classification (ZIC) is proposed, which aims to make machines that can learn to classify unseen images like humans. The problem can be viewed from two different levels. Low-level technical issues are concerned by the general Zero-shot Learning (ZSL) problem which considers how to train a classifier on the unseen visual domain using prior knowledge. High-level issues incorporate how to design and organise visual knowledge representation to construct a systematic ontology that could be an ultimate knowledge base for machines to learn. This thesis aims to provide a thorough study of the ZIC problem, regarding models, challenges, possible applications, etc. Besides, each main chapter demonstrates an innovative contribution that is creatively made during my study. The first is to solve the problem of Visual-Semantic Ambiguity. Namely, the same semantic concepts (e.g. attributes) can refer to a huge variety of visual features, and vice versa. Conventional ZSL methods usually adopt a one-way embedding that maps such high-variance visual features into the semantic space, which may lead to degraded performance. As a solution, a dual-graph regularised embedding algorithm named Visual-Semantic Ambiguity Removal (VSAR) is proposed, which can capture the intrinsic local structure of both visual and semantic spaces. In the intermediate embedding space, the structural difference is reconciled to remove the ambiguity. The second contribution aims to circumvent costly visual data collection for conventional supervised classification using ZSL techniques. The key idea is to synthesise visual features from the semantic information, just like humans can imagine features of an unseen class from the semantic description of prior knowledge. Hereafter, new objects from unseen classes can be classified in a conventional supervised framework using the inferred visual features. To overcome the correlation problem, we propose an intermediate Orthogonal Semantic-Visual Embedding (OSVE) space to remove the correlated redundancy. The proposed method achieves promising performance on fine-grained datasets. In the third contribution, the graph constraint of VSAR is incorporated to synthesise improved visual features. The orthogonal embedding is reconsidered as an Information Diffusion problem. Through an orthogonal rotation, the synthesised visual features become more discriminative. On four benchmarks, the proposed method demonstrates the advantages of synthesised visual features, which significantly outperforms state-of-the-art results. Since most of ZSL approaches highly rely on expensive attributes, the fourth contribution of this thesis explores a more feasible but more effective Semantic Simile model to describe unseen classes. From a group of similes, e.g. an unknown animal has the same parts of a wolf, and the colour looks like a bobcat, implicit attributes are discovered by a graph-cut algorithm. Comprehensive experimental results suggest the simile-based implicit attributes can significantly boost the performance. To maximumly reduce the cost of building ontologies for ZIC, the final chapter introduces a novel scheme, using which ZIC can be achieved by only a few similes of each unseen class. No annotations of seen classes are needed. Such an approach finally sets ZIC attribute-free, which significantly improve the feasibility of ZIC. Unseen classes can be recognised using a conventional setting without expensive attribute ontology. It can be concluded that the methods introduced in this thesis provide fundamental components of a zero-shot image classification system. The thesis also points out four core directions for future ZIC research.
Supervisor: Shao, Ling ; Xiaoli, Chu Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available