Neural networks and adaptive computers : theory and methods of stochastic adaptive computation
This thesis studies the theory of stochastic adaptive computation based on neural networks. A mathematical theory of computation is developed in the framework of information geometry, which generalises Turing machine (TM) computation in three aspects - It can be continuous, stochastic and adaptive - and retains the TM computation as a subclass called "data processing". The concepts of Boltzmann distribution, Gibbs sampler and simulated annealing are formally defined and their interrelationships are studied. The concept of "trainable information processor" (TIP) - parameterised stochastic mapping with a rule to change the parameters - is introduced as an abstraction of neural network models. A mathematical theory of the class of homogeneous semilinear neural networks is developed, which includes most of the commonly studied NN models such as back propagation NN, Boltzmann machine and Hopfield net, and a general scheme is developed to classify the structures, dynamics and learning rules. All the previously known general learning rules are based on gradient following (GF), which are susceptible to local optima in weight space. Contrary to the widely held belief that this is rarely a problem in practice, numerical experiments show that for most non-trivial learning tasks GF learning never converges to a global optimum. To overcome the local optima, simulated annealing is introduced into the learning rule, so that the network retains adequate amount of "global search" in the learning process. Extensive numerical experiments confirm that the network always converges to a global optimum in the weight space. The resulting learning rule is also easier to be implemented and more biologically plausible than back propagation and Boltzmann machine learning rules: Only a scalar needs to be back-propagated for the whole network. Various connectionist models have been proposed in the literature for solving various instances of problems, without a general method by which their merits can be combined. Instead of proposing yet another model, we try to build a modular structure in which each module is basically a TIP. As an extension of simulated annealing to temporal problems, we generalise the theory of dynamic programming and Markov decision process to allow adaptive learning, resulting in a computational system called a "basic adaptive computer", which has the advantage over earlier reinforcement learning systems, such as Sutton's "Dyna", in that it can adapt in a combinatorial environment and still converge to a global optimum. The theories are developed with a universal normalisation scheme for all the learning parameters so that the learning system can be built without prior knowledge of the problems it is to solve.