Use this URL to cite or link to this record in EThOS:
Title: Fast online model learning for controlling complex real-world robots
Author: Loviken, Pontus
ISNI:       0000 0004 8503 7119
Awarding Body: University of Plymouth
Current Institution: University of Plymouth
Date of Award: 2019
Availability of Full Text:
Access from EThOS:
Access from Institution:
How can real robots with many degrees of freedom - without previous knowledge of themselves or their environment - act and use the resulting observations to efficiently develop the ability to generate a wide set of useful behaviours? This thesis presents a novel framework that enables physical robots with many degrees of freedom to rapidly learn models for control from scratch. This can be done in previously inaccessible problem domains characterised by a lack of direct mappings from motor actions to outcomes, as well as state and action spaces too large for the full forward dynamics to be learned and used explicitly. The proposed framework is able to cope with these issues by the use of a set of local Goal Babbling models, that maps every outcome in a low dimensional task space to a specific action, together with a sparse higher level Reinforcement Learning model, that learns to navigate between the contexts from which each Goal Babbling model can be used. The two types of models can then be learned online an in parallel, using only the data a robot can collect by interacting with its environment. To show the potential of the approach we present two possible implementations of the framework, over two separate robot platforms: a simulated planar arm with up to 1, 000 degrees of freedom, and a real humanoid robot with 25 degrees of freedom. The results show that learning is rapid and essentially unaffected by the number of degrees of freedom of the robot, allowing for the generation of complex behaviours and skills after a relatively short training time. The planar arm is able to strategically plan series of motions in order to move its end-effector between any two parts of a crowded environment, within 10, 000 iterations. The humanoid robot is able to freely transition between states such as lying on the back, belly, and sides, and occasionally also sitting up, within only 1, 000 iterations. This corresponds to 30 − 60 minutes of real-world interactions. The main contribution of this thesis is to provide a framework for solving a control learning problem, previously largely unexplored with no obvious solutions, but with strong analogies to, for example, early learning of body orientation control in infants. This thesis examined two quite different implementations of the proposed framework, and showed success in both cases for two different control learning problem.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: model learning ; Reinforcement learning ; Online learning ; Goal babbling ; inverse models ; Micro data learning ; Developmental robotics ; real-world robots ; sensorimotor control