Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.298086
Title: Continuous, speaker-independent, speech recognition for a speech to viseme translator
Author: Kelleher, Holly
ISNI:       0000 0001 3596 5118
Awarding Body: University of Surrey
Current Institution: University of Surrey
Date of Award: 1999
Availability of Full Text:
Access from EThOS:
Access from Institution:
Abstract:
The work presented in this thesis forms part of a research project which attempts to generate a visualisation of a speaker's mouth from purely acoustic speech signals. The aim is to provide an aid for partially hearing impaired people in which visual information is presented alongside limited acoustic signals, facilitating easier use of the telephone. The system is essentially a low-level speech recogniser in which phonemic information is extracted from the speech waveform and mapped onto visemes generated on a synthetic facial image. This thesis presents a description of a major part of this project, that is, the development of an accurate phoneme discriminator which is capable of speaker independent operation, on continuous speech. The recognition process is realised in three stages: a pre-processor to convert the speech into a suitable parametric form; a pattern recogniser to identify the possible phoneme classes and a post-processor to produce the viseme information. The pattern recognition stage uses a self-organising Kohonen network, followed by a Learning Vector Quantiser (LVQ) to further improve the recognition accuracy. The performance of this stage is highly dependent on the choice of pre-processor used at the input to the network and it is the design of the pre-processor stage that forms a significant part of this work. A novel technique known as the pseudo-cepstrum forms the basis of this pre-processor. Extensive investigations have been conducted into the dependence of performance on a range of parameters, both at the pre-processor stage and within the Kohonen classifier. In particular, a performance comparison of several preprocessor techniques, including the pseudo-cepstrum, has been carried out. Factors affecting both the training and operation of the classifier are also described here, with the sensitivity of recognition performance to the input data, being a major issue. Overall recognition accuracies of 80% have been achieved.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.298086  DOI: Not available
Keywords: Acoustic
Share: