Reinforcement learning of visually guided spatial goal directed movement
A range of visually guided, spatial goal directed tasks are investigated, using a computational neuroethology approach. Animats are embedded within a bounded, 2-D environment, and map a 1-D visual array, through a convolution network, to a topography preserving motor array that stochastically determines the direction of movement. Temporal difference reinforcement learning modifies the convolution network in response to a reinforcement signal received only at the goal location. Three forms of visual coding are compared: multiscale coding, where the visual array is convolved by Laplacian of Gaussian filters at a range of spatial scales before convolution to determine the motor array; rectified multiscale coding, where the multiscale array is split into positive and negative components; and intensity coding, where the unfiltered visual array is convolved to determine the motor array. After learning, animats are examined in terms of performance, behaviour and internal structure. When animats learn to approach a solitary circle, of randomly varying contrast, rectified multiscale coding animats learn to outperform multiscale and intensity coding animats in both independent and coarse scale noise conditions. Analysis of the learned internal structure shows that rectified multiscale filtering facilitates learning by enabling detection of the circle at scales least affected by noise. Cartwright and Collett (1983) showed that honeybees learn the angle subtended by a featureless landmark to guide movement to a food source at a fixed distance from the landmark, and furthermore, when tested with only the edges of the landmark, still search in the same location. In a simulation of this experiment, animats are reinforced for moving to where the angle subtended by a solitary circle falls within a certain range. Rectified multiscale filtering leads to better performing animats, with fewer hidden units, in both independent and coarse scale visual noise conditions, though for different reasons in each case. Only those animats with rectified multiscale filtering, that learn in the presence of coarse scale noise, show similar generalisation to the honeybees. Collett, Cartwright and Smith (1986) trained gerbils to search at locations relative to arrangemments of landmarks and tested their search patterns in modifications of the training arrangements. These experiments are simulated with landmark distance coded as either a 1-D intensity array, or a 2-D vector array, plus a simple compass sense. Vector coding animats significantly outperform those using intensity coding and do so with fewer hidden units. Furthermore, vector coding animats show a close match to gerbil behaviour in tests with modified landmark arrangements.