Use this URL to cite or link to this record in EThOS:
Title: Sketch based image retrieval on big visual data
Author: Bui, Tu
ISNI:       0000 0004 7657 3614
Awarding Body: University of Surrey
Current Institution: University of Surrey
Date of Award: 2019
Availability of Full Text:
Access from EThOS:
Access from Institution:
The deluge of visual content on the Internet - from user-generated content to commercial image collections - motivates intuitive new methods for searching digital image content: how can we find certain images in a database of millions? Sketch-based image retrieval (SBIR) is an emerging research topic in which a free-hand drawing can be used to visually query photographic images. SBIR is aligned to emerging trends for visual content consumption on mobile touch-screen based devices, for which gestural interactions such as sketch are a natural alternative to textual input. This thesis presents several contributions to the literature of SBIR. First, we propose a cross-domain learning framework that maps both sketches and images into a joint embedding space invariant to depictive style, while preserving semantics. The resulting embedding enables direct comparison and search between sketches and images and is based upon a multi-branch convolutional neural network (CNN) trained using unique parameter sharing and training schemes. The deeply learned embedding is shown to yield state-of-art retrieval performance on several SBIR benchmarks. Second, under two separate works we propose to disambiguate sketched queries by combining sketched shape with a secondary modality: SBIR with colour and with aesthetic context. The former enables querying with coloured line-art sketches. Colour and shape features are extracted locally using a modified version of gradient field orientation histogram (GF-HoG) before globally pooled using dictionary learning. Various colour-shape fusion strategies are explored, coupled with an efficient indexing scheme for fast retrieval performance. The latter supports querying using both a sketched shape accompanied by one or several images serving as an aesthetic constraint governing the visual style of search results. We propose to model structure and style separately dis-entangling one modality from the other; then learn structure-style fusion using a hierarchical triplet network. This method enables further studies beyond SBIR such as style blending, style analogy and retrieval with alternative-modal queries. Third, we explore mid-grain SBIR -- a novel field requiring retrieved images to match both category and key visual characteristics of the sketch without demanding fine-grain, instance-level matching of specific object instance. We study a semi-supervised approach that requires mainly class-labelled sketches and images plus a small number of instance-labelled sketch-image pairs. This approach involves aligning sketch and image embeddings before pooling them into clusters from which mid-grain similarity may be measured. Our learned model demonstrates not only intra-category discrimination (mid-grain) but also improved inter-category discrimination (coarse-grain) on a newly created MidGrain65c dataset.
Supervisor: Collomosse, John Sponsor: EPSRC
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral