Use this URL to cite or link to this record in EThOS: https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.794524
Title: Temporal labelling for action recognition in videos
Author: Moltisanti, Davide
ISNI:       0000 0004 8500 072X
Awarding Body: University of Bristol
Current Institution: University of Bristol
Date of Award: 2019
Availability of Full Text:
Access from EThOS:
Access from Institution:
Abstract:
Action recognition in computer vision is the task of understanding what a subject is doing in an environment. When performing recognition in videos, labels are typically provided in the form of a category class, along with the temporal boundaries of the action. Labelling action boundaries entails that an annotator decides when the action starts and ends. This is a subjective and arbitrary task, i.e. different people are likely to identify the start and the end of an action differently. As action boundaries vary, salient and irrelevant video frames are included or excluded, thus the ability of a classifier to learn and detect actions may be influenced. This Thesis offers an insight into how action boundaries are perceived and how they can affect classification in videos. An important finding of this study is that accurate temporal labelling is crucial to learn discriminative representations of the actions, using current state-of-the-art methods. This Thesis also proposes the Rubicon Boundaries, annotation guidelines inspired by work in cognitive psychology that aim to alleviate labelling ambiguity, in the attempt to foster more precise and consistent annotations. Action boundaries are not only arbitrary, but also expensive to annotate. This Thesis proposes a novel level of temporal supervision for the task of action recognition, i.e. single timestamps roughly aligned with actions in untrimmed videos. Using this type of supervision, together with the proposed training algorithm, it is possible to achieve performance comparable to results obtained with full temporal supervision. The proposed method can operate under varying dataset complexity, highlighting that single timestamps constitute a good compromise between labelling effort and performance. Additionally, single timestamps also alleviate ambiguity, since annotators do not have to decide when the action starts and ends, but only to mark one frame within or close to the action.
Supervisor: Aldamen, Dima ; Mayol-Cuevas, Walterio Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.794524  DOI: Not available
Share: