Use this URL to cite or link to this record in EThOS: https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.769677
Title: Hardware architectures for machine learning training
Author: Shao, Shengjia
ISNI:       0000 0004 7658 9042
Awarding Body: Imperial College London
Current Institution: Imperial College London
Date of Award: 2019
Availability of Full Text:
Access from EThOS:
Access from Institution:
Abstract:
Machine learning applications are computationally expensive, but they can benefit from hardware acceleration. In recent years, there has been substantial research on accelerating the inference stage of machine learning models, but the work on the training stage has been limited. This thesis explores the hardware acceleration of the training stage of machine learning models. Specifically, this thesis makes the following three contributions. The first contribution of this thesis is incremental and decremental training of Support Vector Regression (SVR). At the algorithm level, we handle the boundary case ignored by existing algorithms to ensure correctness. At the hardware level, we design an efficient hardware architecture that circumvents algorithmic obstacles. FPGA implementation of the proposed SVR training scheme is evaluated with high frequency financial data. Significant speed-up is achieved using our FPGA-based system over CPU and GPU. The second contribution of this thesis is the hardware architecture for Trust Region Policy Optimisation (TRPO), an advanced Reinforcement Learning algorithm. The main novelty is the use of Pearlmutter Propagation to circumvent algorithm level obstacles in order to enable a streamlined hardware implementation. The proposed hardware architecture implemented on FPGA is evaluated with robotic control benchmarks. It achieves significant speed-up against machine learning libraries running on CPU and GPU. The third contribution of this thesis is the application of hardware accelerated TRPO training to real robotic control. We propose a workflow of training the controller in simulation with TRPO then port it to the real robot. In addition to the hardware accelerated TRPO, a Customised Lightweight Simulator running on FPGA significantly speeds up the simulation process. The workflow is applied to a real robot arm. Several experiments are performed to illustrate the effectiveness of the proposed workflow in controlling a real robot arm.
Supervisor: Luk, Wayne Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.769677  DOI:
Share: