Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.626165
Title: Leveraging weak supervision for video understanding
Author: Garcia Cifuentes, C.
Awarding Body: University College London (University of London)
Current Institution: University College London (University of London)
Date of Award: 2013
Availability of Full Text:
Access through EThOS:
Full text unavailable from EThOS. Please try the link below.
Access through Institution:
Abstract:
This research deals with the challenging task of video classification, with a particular focus on action recognition, which is essential for a comprehensive understanding of videos. In the typical scenario, there is a list of semantic categories to be modeled, and example clips are given together with their associated category label, indicating which action of interests happens in that clip. No information is given about where or when the action happens, even less about why the annotator considered the clip to belong to a sometimes ambiguous category. Within the framework of the bag-of-words representation of videos, we explore how to leverage such weak labels from three points of view: (i) the use of coherent supervision from the earliest stages of the pipeline; (ii) the combination of heterogeneous features in nature and scale; and (iii) mid-level representations of videos based on regions, so as to increase the ability to discriminate relevant locations in the video. For the quantization of local features, we propose and evaluate a novel form of supervision to train random forests which explicitly aims at the discriminative power of the resulting bags of words. We show that our forests are better than traditional ones at incorporating contextual elements during quantization, and draw attention to the risk of naive combination of features. We also show that mid-level representations carry complementary information that can improve classification. Moreover, we propose a novel application of video classification to tracking. We show that weak clip labels can be used to successfully classify videos into categories of dynamic models. In this way, we improve tracking by performing classification-based dynamic model selection.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.626165  DOI: Not available
Share: