The n-tuple network: coping with sub-optimality
The subject of this thesis is the n-tuple net.work (RAMnet). The major advantage of RAMnets is their speed and the simplicity with which they can be implemented in parallel hardware. On the other hand, this method is not a universal approximator and the training procedure does not involve the minimisation of a cost function. Hence RAMnets are potentially sub-optimal. It is important to understand the source of this sub-optimality and to develop the analytical tools that allow us to quantify the generalisation cost of using this model for any given data. We view RAMnets as classifiers and function approximators and try to determine how critical their lack of' universality and optimality is. In order to understand better the inherent. restrictions of the model, we review RAMnets showing their relationship to a number of well established general models such as: Associative Memories, Kamerva's Sparse Distributed Memory, Radial Basis Functions, General Regression Networks and Bayesian Classifiers. We then benchmark binary RAMnet. model against 23 other algorithms using real-world data from the StatLog Project. This large scale experimental study indicates that RAMnets are often capable of delivering results which are competitive with those obtained by more sophisticated, computationally expensive rnodels. The Frequency Weighted version is also benchmarked and shown to perform worse than the binary RAMnet for large values of the tuple size n. We demonstrate that the main issues in the Frequency Weighted RAMnets is adequate probability estimation and propose Good-Turing estimates in place of the more commonly used :Maximum Likelihood estimates. Having established the viability of the method numerically, we focus on providillg an analytical framework that allows us to quantify the generalisation cost of RAMnets for a given datasetL. For the classification network we provide a semi-quantitative argument which is based on the notion of Tuple distance. It gives a good indication of whether the network will fail for the given data. A rigorous Bayesian framework with Gaussian process prior assumptions is given for the regression n-tuple net. We show how to calculate the generalisation cost of this net and verify the results numerically for one dimensional noisy interpolation problems. We conclude that the n-tuple method of classification based on memorisation of random features can be a powerful alternative to slower cost driven models. The speed of the method is at the expense of its optimality. RAMnets will fail for certain datasets but the cases when they do so are relatively easy to determine with the analytical tools we provide.