Use this URL to cite or link to this record in EThOS:
Title: Self-supervised learning using motion and visualizing convolutional neural networks
Author: Mahendran, Aravindh
ISNI:       0000 0004 7653 7840
Awarding Body: University of Oxford
Current Institution: University of Oxford
Date of Award: 2018
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
We propose a novel method for learning convolutional image representations without manual supervision. We use motion in the form of optical-flow, to supervise representations of static images. Training a network to predict flow from a single image can be needlessly difficult due to intrinsic ambiguities in this prediction task. We instead propose two simpler learning goals: (a) embed pixels such that the similarity between their embeddings matches that between their optical-flow vectors (CPFS), or (b) segment the image such that optical-flow within segments constitutes coherent motion (S3-CNN). At test time, the learned deep network can be used without access to video or flow information and transferred to various computer vision tasks such as image classification, detection, and segmentation. Our CPFS model achieves state-of-the-art results in self-supervision using motion cues, as demonstrated on standard transfer learning benchmarks. Despite high transfer learning performance, we feel the need to visualize the representation learned by our self-supervised CPFS model. With that motivation we develop a suite of visualization methods and study several landmark representations, both shallow and deep. These visualizations are based on the concept of "natural pre-image", that is a natural-looking image whose representation has some notable property. We study three such visualizations: inversion, in which the aim is to reconstruct an image from its representation, activation maximization, in which we search for patterns that maximally stimulate a representation component, and caricaturization, in which the visual patterns that a representation detects in an image are exaggerated. We formulate these into a regularized energy-minimization framework and demonstrate its effectiveness. We show that our method can invert HOG features more accurately than recent alternatives while being applicable to CNNs too. We apply these visualization techniques to our self-supervised CPFS model and contrast it with visualizations of a fully supervised AlexNet and a randomly initialized one.
Supervisor: Vedaldi, Andrea Sponsor: European Research Council ; BP ; Amazon
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: Computer Vision ; Machine Learning