Use this URL to cite or link to this record in EThOS:
Title: Cross-modal classification and retrieval of multimodal data using combinations of neural networks
Author: Saragiotis, Panagiotis
ISNI:       0000 0001 3552 9172
Awarding Body: University of Surrey
Current Institution: University of Surrey
Date of Award: 2006
Availability of Full Text:
Access from EThOS:
Access from Institution:
Current neurobiological thinking supported, in part, by experimentation stresses the importance of cross-modality. Uni-modal cognitive tasks, language and vision, for example, are performed with the help of many networks working simultaneously or sequentially; and for cross-modal tasks, like picture / object naming and word illustration, the output of these networks is combined to produce higher cognitive behaviour. The notion of multi-net processing is used typically in the pattern recognition literature, where ensemble networks of weak classifiers - typically supervised - appear to outperform strong classifiers. We have built a system, based on combinations of neural networks, that demonstrates how cross-modal classification can be used to retrieve multi-modal data using one of the available modalities of information. Two multi-net systems were used in this work: one comprising Kohonen SOMs that interact with each other via a Hebbian network and a fuzzy ARTMAP network where the interaction is through the embedded map field. The multi-nets were used for the cross-modal retrieval of images given keywords and for finding the appropriate keywords for an image. The systems were trained on two publicly available image databases that had collateral annotations on the images. The Hemera collection, comprising images of pre-segmented single objects, and the Corel collection with images of multiple objects were used for automatically generating various sets of input vectors. We have attempted to develop a method for evaluating the performance of multi-net systems using a monolithic network trained on modally-undifferentiated vectors as an intuitive bench-mark. To this extent single SOM and fuzzy ART networks were trained using a concatenated visual / linguistic vector to test the performance of multi-net systems with typical monolithic systems. Both multi-nets outperform the respective monolithic systems in terms of information retrieval measures of precision and recall on test images drawn from both datasets; the SOM multi-net outperforms the fuzzy ARTMAP both in terms of convergence and precision-recall. The performance of the SOM-based multi-net in retrieval, classification and auto-annotation is on a par with that of state of the art systems like "ALIP" and "Blobworld". Much of the neural network based simulations reported in the literature use supervised learning algorithms. Such algorithms are suited when classes of objects are predefined and objects in themselves are quite unique in terms of their attributes. We have compared the performance of our multi-net systems with that of a multi-layer perceptron (MLP). The MLP does show substantially greater precision and recall on a (fixed) class of objects when compared with our unsupervised systems. However when 'lesioned' -the network connectivity 'damaged' deliberately- the multi-net systems show a greater degree of robustness. Cross-modal systems appear to hold considerable intellectual and commercial potential and the multi-net approach facilitates the simulation of such systems.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available