Title:

A VLSI hardware neural accelerator using reduced precision arithmetic

A synthetic neural network is a massively parallel array of computational units (neurons) that captures some of the functionality and computational strengths of the brain. The functions that it may have are the ability to consider many solutions simultaneously, the ability to work with corrupted or incomplete data without any form of error correction and a natural fault tolerance, which is acquired from the parallelism and the representation of knowledge in a distributed fashion giving rise to graceful degradation as faults appear. A neuron can be thought of, in engineering terms, as a state machine that signals its 'on' state by the absence of a voltage. The level of excitation of the neuron is represented by its quality of activity. The activity is related to the neural state by an activation function, which is usually the 'sigmoid' or 'Sshape' function. This function represents a smooth switching of neural state from off to on as the activity increases through a threshold. Direct stimulation of the neuron from outside the network and contributions from other neurons in the network will change the level of activity. The levels of firing from other neurons to a receiving neuron are weighted by interneural synaptic weights. The weights represent the long term memory storage elements of network. By altering the value of the weights, information is encoded or 'learnt' by the network, which adds to its store of knowledge. There are three broad categories into which neural network research can be divided. These are mathematical description and analysis of the dynamical learning properties of the network, computer simulation of the mathematical methods and the VLSI hardware implementation of neural functions or classes of neural networks. It is the final category into which the main thrust of this thesis falls. The research presented here implements a VLSI digital neural network as a neural accelerator to speed up simulation times. The VLSI design incorporates a parallel array of synapses. The synapses provide the connections between neurons. Each synapse effectively 'multiplies' the neural state of the receiving neuron by the synaptic weight between the sending neuron and the receiving neuron. The 'multiplication' is achieved by using reduced precision arithmetic that has a 'staircase' activation function modelled on the sigmoid activation function and allows the neuron to be in any one of five states. Therefore, with little loss in precision, the reduced precision arithmetic avoids using full multiplication, which is expensive in silicon area. The reduced arithmetic synpase increases the number of synapses that can be implemented on a single die. The VLSI neural network chips can be easily cascaded together to give a larger array of synapses. Four cascaded chips resulted in 108 synapses in an array. However, this size of array was too small to perform neural network learning simulations. Therefore the synapse array has been configured in a paging architecture, that has traded off some of the high speed of the chips (upto 20 Mhz) against increased network size. The synapse array has been wired with support circuitry on to a board to give a neural accelerator that is interface to a host Sun computer. The paging architecture of the board allows a network of several hundred neurons to be simulated. The neural accelerator is used with the delta learning rule algorithm and results show its increased acceleration to be up to two orders of magnitude over equivalent software simulations.
