Use this URL to cite or link to this record in EThOS:
Title: SLAM and deep learning for 3D indoor scene understanding
Author: McCormac, Brendan John
ISNI:       0000 0004 7659 1222
Awarding Body: Imperial College London
Current Institution: Imperial College London
Date of Award: 2019
Availability of Full Text:
Access from EThOS:
Access from Institution:
We build upon research in the fields of Simultaneous Localisation and Mapping (SLAM) and Deep Learning to develop 3D maps of indoor scenes that not only describe where things are but what they are. We focus on real-time online methods suitable for applications such as domestic robotics and augmented reality. While early approaches to SLAM used sparse feature maps for localisation, recent years have seen the advent of real-time dense SLAM systems which enabled applications not possible with only sparse feature maps. Further augmenting dense maps with semantic information will in future enable more intelligent domestic robots and more intuitive human-map interactions not possible with map geometry alone. Early work presented here sought to combine recent advances in semantic segmentation using Convolutional Neural Networks (CNNs) with dense SLAM approaches to produce a semantically annotated dense 3D map. Although we found this combination improved segmentation performance, its inherent limitations subsequently led to a paradigm shift away from semantic annotation towards instance detection and 3D object-level mapping. We propose a new type of SLAM system consisting of discovered object instances that are reconstructed online in individual volumes. We develop a new approach to robustly combine multiple associated 2D instance mask detections into a fused 3D foreground segmentation for each object. The use of individual volumes allows the relative poses of objects to be optimised in a pose-graph, producing a consistent global map that allows objects to be reused on loopy trajectories, and which can improve reconstruction quality. A notable feature of CNNs is their ability to make use of large annotated datasets, and so we also explore methods to reduce the cost of indoor semantic dataset production. We explore SLAM as a means of mitigating labour intensive annotation of video data, but found that producing a large-scale dataset with such an approach would still require significant resources. We therefore explore automated methods to produce a large-scale photorealistic synthetic dataset of indoor trajectories at low cost, and we verify the benefits of the dataset on the task of semantic segmentation. To automate trajectory generation we present a novel two-body random trajectory method that mitigates issues of a completely random approach, and which has subsequently been used in other synthetic indoor datasets.
Supervisor: Davison, Andrew ; Leutenegger, Stefan Sponsor: James Dyson Foundation
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral