Use this URL to cite or link to this record in EThOS:
Title: Athlete pose estimation from single-view TV broadcast footage
Author: Fastovets, Mykyta
ISNI:       0000 0004 6061 6210
Awarding Body: University of Surrey
Current Institution: University of Surrey
Date of Award: 2017
Availability of Full Text:
Access from EThOS:
Access from Institution:
This thesis presents work on athlete pose estimation in single-view broadcast videos. Human pose estimation is an important problem in computer vision and has received much interest in the research community due to the wide range of applications. This thesis presents a novel framework for the semi-automatic estimation of human pose in television quality sports footage. The focus is on achieving accurate pose estimation results on sports video sequences, with the assistance of a human operator in a broadcast studio setting, that can be used to drive post-action analysis and graphical overlays. A method for extracting and tracking off-the-shelf scale-invariant features on athletes is tested. Evaluation shows that such features are ill-suited for tracking articulated motion due to drift, data association, and a general lack of stable features to track. A keyframe-driven approach, inspired by the Pictorial Structures model, is developed for estimating 2D pose of athletes in sports sequences. This approach models the human body as a tree of loosely linked parts and introduces a temporal smoothness term aimed at ensuring temporal consistency of pose throughout the sequence. The evaluation demonstrates that such an approach is able to extract human pose in such videos, but requires a significant amount of manual interaction to do so with accuracy required for broadcast settings. A novel non-sequential method for maximising benefit from manually annotated keyframe poses using minimum spanning trees is developed. The developed algorithm serves two purposes: keyframe selection, and keyframe information propagation. Optimal keyframes are automatically selected and suggested to the operator for labelling. Once labelled, information from these keyframes is propagated throughout the sequence and automatically generated keyframes are created in visually similar frames. Qualitative and quantitative evaluation demonstrates an increase in accuracy and a decrease in the number of required keyframes. Finally, a geometric method for converting 2D poses into 3D is developed. The algorithm assumes a weak perspective projection for the video sequence and known relative limb lengths for the athlete, and is able to recover the relative scale given at least three labelled keyframes by solving a continuous optimisation problem. Evaluation against a baseline geometric method shows improved stability and lower residual error.
Supervisor: Hilton, A. ; Guillemaut, J-Y. Sponsor: BBC R&D ; University of Surrey
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available