Use this URL to cite or link to this record in EThOS:
Title: Visual and thermal odometry with Deep Neural Networks
Author: Saputra, Muhamad
ISNI:       0000 0004 9355 6314
Awarding Body: University of Oxford
Current Institution: University of Oxford
Date of Award: 2020
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
Accurate camera ego-motion estimation, widely known as Visual Odometry (VO), remains a key prerequisite for many applications in computer vision and robotics. Conventional VO, which relies on hand-crafted feature engineering, is prone to drift and can easily lose track as the extracted features contain outliers and unknown noise. This is problematic for application which requires robustness and high level accuracy such as for tracking UAV position in underground tunnel or estimating firefighter position in emergency operation. To alleviate the problem of noisy feature engineering, machine learning algorithms, especially Deep Neural Networks (DNN), have been used in the past few years to automatically learn robust odometry features from large amounts of data. However, despite some promising results, several fundamental drawbacks still exist in terms of accuracy, efficiency, and applicability to visually-denied environments. The work presented in this thesis tackles these shortcomings by proposing a novel network architecture and optimization strategy for DNN-based odometry estimation. To address issues of accuracy and long-term consistency, we propose to train DNN-based VO using both a windowed-based composite transformation loss and relative transformation loss through curriculum learning. With this approach, we can improve the generalization ability of the network for both translation and rotation by 21% and 16% respectively. We also propose the use of an attention network to conditionally re-weight image features such that the network can produce more accurate poses whilst being more amenable to interpretation. This method improves translation and rotation estimation by 27.8% and 43.1% respectively over the model without attention. The second contribution deals with the efficiency problem of DNN-based VO by proposing the first distillation approach for camera pose regression. We demonstrate that distilling knowledge from a deep pose regression network can be done effectively if we emphasize the knowledge transfer only when we trust the teacher network prediction. We also show that a distilled network can be further compressed with factorization and could be more generalizable due to low-rank constraints. Our proposed approach can reduce the number of student parameters by up to 92.95% (2.12× faster) whilst keeping the prediction accuracy very close to that of the teacher. Finally, we deal with the issue of tracking in visually-denied environments by proposing the first DNN-based thermal-inertial odometry system. Since thermal images inherently lack robust features, we design the network to not only extract features from the thermal images, but to also hallucinate visual features given thermal image as the input. Through extensive evaluation across two datasets, we conclude that our proposed method can produces accurate odometry estimation with less than 2 m absolute trajectory errors on average.
Supervisor: Trigoni, Agathoniki ; Markham, Andrew Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: Machine Learning ; Cyber-Physical Systems ; Robotics