Use this URL to cite or link to this record in EThOS:
Title: Pixel-level scene understanding with deep structured models
Author: Arnab, Anurag
ISNI:       0000 0004 8508 0045
Awarding Body: University of Oxford
Current Institution: University of Oxford
Date of Award: 2019
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
Although humans can effortlessly recognise a scene in its totality, it is an extremely challenging problem for computers which is why scene understanding remains one of the fundamental problems in computer vision. This thesis concentrates on pixel-level scene understanding tasks such as semantic- and instance-segmentation, which have applications in diverse fields such as autonomous vehicles, medical diagnosis and assistive technologies for the partially sighted among others. Firstly, this thesis addresses the task of semantic segmentation by integrating mean-field inference of a Conditional Random Field (CRF) with higher order potentials directly into a deep neural network. This approach enables joint, end-to-end training of both the parameters of the CRF and the underlying CNN, and achieved state-of-the-art results on public leaderboards at the time of publication. This method is then extended to the task of instance segmentation. In contrast to previous work, the proposed formulation jointly processes all instances in the image. As such, one pixel can only be assigned to one instance and the network must thus learn to reason about occlusions between instances. Moreover, unlike previous work, this approach can naturally segment "stuff" classes. This method also achieved state-of-the-art results at the time of publication. Realising the fact that pixel-level training data for segmentation is time-consuming and thus expensive to obtain, this thesis then proposes a method of training semantic- and instance-segmentation models with weaker supervision. In particular, annotations in the form of bounding-boxes and image-level tags are considered, which are shown to significantly reduce annotation time with a relatively small impact on the final performance compared to a fully-supervised baseline. Finally, this thesis studies the adversarial robustness of popular semantic segmentation architectures. This topic is motivated by the fact that during the course of this thesis, segmentation systems have become accurate enough to use in real-world applications, and thus the security of models deployed in production is critical. The effect of various architectural components on adversarial robustness are thoroughly evaluated, and mean-field inference of CRFs, multiscale processing (and more generally, input transformation) are shown to naturally implement concurrently proposed adversarial defences.
Supervisor: Torr, Philip Sponsor: Engineering and Physical Sciences Research Council ; Clarendon Fund ; European Research Council
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: Machine Learning ; Computer Vision ; Artificial Intelligence