Use this URL to cite or link to this record in EThOS:
Title: Recovering 6D object pose at the level of instances and categories
Author: Sahin, Caner
ISNI:       0000 0004 7659 022X
Awarding Body: Imperial College London
Current Institution: Imperial College London
Date of Award: 2019
Availability of Full Text:
Access from EThOS:
Access from Institution:
6D object pose estimation is an important problem in the realm of computer vision that determines the 3D position and 3D rotation of an object in camera-centered coordinates. It has extensively been studied in the past decade as it is of great importance to many rapidly evolving technological areas, such as robotics and augmented reality. This thesis addresses the problem at the level of both instances and categories. At the level of instances, source data from which a classifier is learnt share the same statistical distributions with the target data on which classifiers will be tested. Estimating 6D poses of seen objects, viewpoint variability, occlusion, clutter, and similar looking distractors are the main challenges of instance-level 6D object pose estimation. On the other hand, there is a distribution shift among source and target domains at the level of categories. High intra-class variations and shape discrepancies between objects are the main challenges of the category-level 6D object pose estimation problem, in which 6D poses of unseen objects of a given category are estimated. This thesis is philosophically built upon these two families of 6D object pose estimation problem and their corresponding challenges. The ways this thesis approaches the instance-level 6D object pose estimation problem are two-fold: Firstly, the current position of the computer vision field regarding instance-level 6D object pose estimation is investigated presenting thorough multi-modal analyses on the problem. The challenges of instances are discussed in RGB-D images comparing the performances of several 6D detectors in order to answer the following questions: What is the current position of the computer vision community for maintaining "automation" in robotic manipulation? What next steps should the community take for improving "autonomy" in robotics while handling objects? Secondly, a novel part-based random forest architecture, Iterative Hough Forest (IHF), is introduced. This architecture is capable of estimating occluded and cluttered objects' 6D pose given a candidate 2D bounding box. It is learnt using parts extracted only from the positive samples. These parts are represented with Histogram of Control Points (HoCP), a "scale-variant" implicit volumetric description, which is derived from recently introduced Implicit B-Splines (IBS). The rich discriminative information provided by this scale-variance is leveraged during inference, where the initial pose estimation of the object is iteratively refined based on more discriminative control points. The thesis next addresses the 6D object pose estimation problem at the level of categories in the context of depth modality. A novel part-based architecture that can tackle the challenges of categories is introduced. This architecture particularly adapts distribution shifts arising from shape discrepancies, and naturally removes the variations of texture, illumination, pose, and hence, it is called as "Intrinsic Structure Adaptor (ISA)". ISA is engineered based on the followings: i) "Semantically Selected Centers (SSC)" are proposed in order to define the "6D pose" at the level of categories. ii) 3D skeleton structures, which are derived as shape-invariant features, are used to represent the parts extracted from the instances of given categories, and privileged one-class learning is employed based on these parts. iii) Graph matching is performed during training in such a way that the adaptation/generalization capability of the proposed architecture is improved across unseen instances.
Supervisor: Kim, Tae-Kyun Sponsor: Turkey Ministry of National Education
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral