Use this URL to cite or link to this record in EThOS:
Title: Learning the structure of object categories from incomplete supervision
Author: Novotny, David
ISNI:       0000 0004 7971 6035
Awarding Body: University of Oxford
Current Institution: University of Oxford
Date of Award: 2018
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
This thesis aims at learning and predicting the fine-grained structure of visual object categories given input image data. Alleviating the common requirement of collecting an ample amount of manual annotations, we propose several approaches that learn given an incomplete supervisory signal. Specifically, we begin with an analysis of the amount of supervision needed to learn all visual variations of an object part. Motivated by the gathered observations, a detector of semantic (i.e. nameable) parts supervised with inexpensive web image search data is then proposed. The main challenge of handling a significant amount of annotation noise is addressed with a novel geometry-appearance embedding. Moving away from semantic part detection, learning generic mid-level elements for understanding the geometry of object categories is brought into focus. A novel architecture that outputs a visual representation suitable for establishing image-to-image semantic correspondences is proposed. The main contribution consists of a new discriminability diversity objective that facilitates learning of sparse image features sensitive to the changes of the geometry of the input. A similar feature learning machine leveraging the equivariance constraint is later introduced. Differently from existing alternatives, we adapt the method for the noisy settings of the training dataset by means of a novel probabilistic introspection framework. This allows for a selective representation of image pixels that have the potential to result in a correct match. Inspired by the ability of deep networks to decompose an object into a constellation of pixel-perfect landmarks, an opposite problem of grouping image pixels belonging to an object is addressed. More specifically, we deal with the instance segmentation problem using a deep convolutional architecture that "colors" image pixels with their instance labels. Identifying the convolutional coloring dilemma, a drawback of standard position-agnostic networks that prevents them from solving this task, we propose a correction comprising a novel position-sensitive semi-convolutional operator. The last tackled task is learning 3D shapes of object categories. Inspired by the human visual system, a deep network that learns by observing an object category in a sequence of videos is described. Our final contribution is a probabilistic learning scheme that increases robustness of network training and enables test-time confidence predictions. This is achieved by explicitly modeling the distribution of training errors caused by the insufficiencies of the model or by the noise in ground truth annotations.
Supervisor: Larlus, Diane ; Vedaldi, Andrea Sponsor: Naver Labs Europe
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: Computer vision ; Artificial Intelligence ; 3D reconstruction ; Machine Learning