Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.633748
Title: Multi-modal recognition of manipulation activities through visual accelerometer tracking, relational histograms, and user-adaptation
Author: Stein, Sebastian
Awarding Body: University of Dundee
Current Institution: University of Dundee
Date of Award: 2014
Availability of Full Text:
Access through EThOS:
Access through Institution:
Abstract:
Activity recognition research in computer vision and pervasive computing has made a remarkable trajectory from distinguishing full-body motion patterns to recognizing complex activities. Manipulation activities as occurring in food preparation are particularly challenging to recognize, as they involve many different objects, non-unique task orders and are subject to personal idiosyncrasies. Video data and data from embedded accelerometers provide complementary information, which motivates an investigation of effective methods for fusing these sensor modalities. This thesis proposes a method for multi-modal recognition of manipulation activities that combines accelerometer data and video at multiple stages of the recognition pipeline. A method for accelerometer tracking is introduced that provides for each accelerometer-equipped object a location estimate in the camera view by identifying a point trajectory that matches well the accelerometer data. It is argued that associating accelerometer data with locations in the video provides a key link for modelling interactions between accelerometer-equipped objects and other visual entities in the scene. Estimates of accelerometer locations and their visual displacements are used to extract two new types of features: (i) Reference Tracklet Statistics characterizes statistical properties of an accelerometer's visual trajectory, and (ii) RETLETS, a feature representation that encodes relative motion, uses an accelerometer's visual trajectory as a reference frame for dense tracklets. In comparison to a traditional sensor fusion approach where features are extracted from each sensor-type independently and concatenated for classification, it is shown that combining RETLETS and Reference Tracklet Statistics with those sensor-specific features performs considerably better. Specifically addressing scenarios in which a recognition system would be primarily used by a single person (e.g., cognitive situational support), this thesis investigates three methods for adapting activity models to a target user based on user-specific training data. Via randomized control trials it is shown that these methods indeed learn user idiosyncrasies. All proposed methods are evaluated on two new challenging datasets of food preparation activities that have been made publicly available. Both datasets feature a novel combination of video and accelerometers attached to objects. The Accelerometer Localization dataset is the first publicly available dataset that enables quantitative evaluation of accelerometer tracking algorithms. The 50 Salads dataset contains 50 sequences of people preparing mixed salads with detailed activity annotations.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.633748  DOI: Not available
Keywords: activity recognition ; sensor fusion ; computer vision ; accelerometers ; machine learning ; action recognition ; food preparation ; situational support ; user adaptation ; accelerometer localization ; tracking
Share: