Use this URL to cite or link to this record in EThOS:
Title: Human motion analysis
Author: Jin, Ning
ISNI:       0000 0001 3590 7102
Awarding Body: University of Surrey
Current Institution: University of Surrey
Date of Award: 2007
Availability of Full Text:
Access from EThOS:
Access from Institution:
In recent years, human motion analysis increasingly becomes one of the most active research areas in computer vision, which is motivated by a wide spectrum of promising applications such as intelligent visual surveillance, content-based video retrieval, human-computer interaction, and so on. Vision-based human motion analysis aims at attempting to detect, track and identify humans from image sequences or videos, and then to recognize their motions. The goal of this research is to provide a robust system for human motion analysis. The efforts mainly focus on human motion recognition. Almost all human motion analysis starts from motion segmentation. Motion segmentation aims at extracting foreground objects of interest from the background. This thesis implements two background subtraction algorithms where the background is modeled respectively by a single Gaussian and a mixture of Gaussians both of which are adapted online recursively to cope with dynamic background changes. Visual tracking comes naturally following motion segmentation, which is the process of locating moving objects over time. In the thesis, we present a multiple human tracking system where a motion model and an appearance model (including color and shape) are built and maintained for each object. The centroid of each object is tracked over time by a Kalman filter. In order to match multiple detected objects with multiple tracked models, for each object, we build a color model respectively using the histogram and the mixture of Gaussians to model its color distributions, and a simple shape model using the aspect ratio of its bounding box. We believe that the shape dynamics, i. e. , the spatio-temporal shape variations in a motion, provides many clues for visually recognizing that motion. This thesis presents an approach to view-dependent human motion recognition. In our approach, Procrustes shape analysis and curvature scale space technique are respectively used for numerically representing 2D human body contours. To model the spatio-temporal shape changes in a motion, we propose two mathematical tools, i. e. , the linear dynamical system and an improved HMM. Since in the traditional exemplar-based HMM framework the hidden states are typically coupled with the training data, which will bring many undesired problems to the learning procedure, we introduce a non-parametric HMM approach that uses discrete output HMM with arbitrary states (decoupled from training data) to learn the shape dynamics directly from large amounts of training data where a non-parametric kernel density estimation algorithm is applied to learn the observation probability distribution in order to compensate for the uncertainty introduced by those arbitrary hidden states. This optimizes the HMM training procedure. Moreover, we also extend our proposed approach to automatic motion segmentation. Here, the meaning of motion segmentation is different from the one mentioned above, which actually means detecting the point in time when people change their motions. View-dependent human motion recognition focuses on all the motion sequence (no matter for training or testing) from a single viewpoint, and ignores the issue of view invariance. Therefore, it is not quite feasible in practice. In this thesis, we present a novel approach to view-invariant human motion recognition. Image-based visual hull explicitly represents the 3D shape information of an object, which is computed from a set of silhouettes. We then use the set of silhouettes to implicitly represent the visual hull. Due to the fact that a silhouette is the 2D projection of an object with respect to a certain camera in the 3D world, which is sensitive to the viewpoint, our multi-silhouette representation for the visual hull entails the correspondence between views. To guarantee the correspondence, we define a canonical multi-camera system and a canonical human body orientation in motions. We then “normalize” all the constructed visual hulls into the canonical multi-camera system, align them to follow the canonical orientation, and finally render them. The rendered views thereby satisfy the requirement of the correspondence. In our visual hulls implicit representation, each silhouette is represented as a fixed number of sampled points on its closed contour, therefore, the 3D shape information is implicitly encoded into the concatenation of multiple 2D contours. Each motion class is then learned by a Hidden Markov Model (HMM) with mixture of Gaussians outputs.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available