Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.640893
Title: Implicit models for automatic pose estimation in static images
Author: Holt, Brian D.
ISNI:       0000 0004 5349 1173
Awarding Body: University of Surrey
Current Institution: University of Surrey
Date of Award: 2015
Availability of Full Text:
Access from EThOS:
Access from Institution:
Abstract:
Automatic human pose estimation is one of the major topics in computer vision. This is a challenging problem, with applications to gaming, human computer interaction, markerless motion capture, video analysis, action and gesture recognition. This thesis addresses the problem of automatically estimating the two dimensional articulated pose of a human in static range images. Implicit models of pose are trained to efficiently predict body part locations of humans in static images based on easily computed depth features. While most prior work has focused on pose estimation in RGB images, range data is used as the basis for this approach because it provides additional information and invariances that can be leveraged to improve estimation accuracy. Three main contributions are each described in their own chapter. The first contribution proposes a novel method to estimate articulated pose by detecting poselets and accumulating predictions from the detections. A basic assumption throughout part-based pose estimation literature is that a `part' should correspond closely to an anatomical subdivision of the body such as `hand' or `forearm', but this is not necessarily the most salient feature for visual recognition. If the part corresponds to a highly deformable anatomical part it becomes even more difficult to detect reliably, making it susceptible to high levels of false positive detections. By contrast, a description such as `half a frontal face and shoulder' or `legs in a scissor shape' may be far easier to detect reliably. The concept of a poselet, defined as a set of parts that are `tightly clustered in configuration space and appearance space' is employed as the representation, and detectors are trained on poselets extracted from the dataset. Meta-data such as the direction and distance from each poselet to each landmark is stored in a database. At test time the method works by applying a multiscale scanning window over the image, and trained poselet detectors activate and predict offset meta-data into Hough accumulator images of the landmark locations. Furthermore, by employing an inference step using the natural hierarchy of the body, limb estimation is improved. The second contribution of this thesis is to cast the pose estimation task as a continuous non-linear regression problem. It is demonstrated that this problem can be effectively addressed by Random Regression Forests. This approach differs from a part-based classification approach in that there are no part detectors at any scale. Instead, the approach is more direct, with binary comparison features computed efficiently on each pixel which are used to vote for body parts. The votes are accumulated in Hough accumulator images and the most likely hypothesis is taken as the peak in a winner-takes-all approach. A new dataset of aligned range and RGB data with annotations of 25,000 images over 12 subjects is contributed. The final chapter of this thesis describes a novel conditional regression model based on poselet detectors. A second contribution of this chapter is the development of a geodesic based method that, combined with estimates of rigid parts, delivers significantly higher predictive accuracy on deformable parts. Intuitively, deformable parts such as the hands correspond to geodesic extrema which can be found using geodesic distances, leading to a further improvement in the accuracy of the model. A geodesic mesh is constructed from the underlying range data and labels are assigned to geodesic extrema. The method proposed exploits the complementary characteristics of rigid and deformable parts resulting in a significant improvement in the predictive accuracy of the limbs.
Supervisor: Bowden, R. Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.640893  DOI: Not available
Share: