Use this URL to cite or link to this record in EThOS:
Title: Temporal characteristics of spoken consonants as discriminants in automatic speech recognition
Author: Green, Philip Duncan
Awarding Body: Keele University
Current Institution: Keele University
Date of Award: 1970
Availability of Full Text:
Access from EThOS:
Access from Institution:
Three time-varying functions, which can be extracted directly from the raw speech waveform, are of importance in the field of automatic speech recognition. These functions arc the zero-crossing rate, the turnaround (local maximum or minimum) rate and the amplitude of the speech wave envelope. The aim of the work described here was to assess the feasibility of using these three variables to distinguish between the various consonant phonemes in English speech. The investigation was confined to consonants spoken in isolated consonant-vowel syllables, with the consonant in the initial position. All the consonant phonems which occur in the initial position in English were spoken with each of ten vowel phonemes by four male speakers. The three functions mentioned above wore extracted from the speech wave by computer routines and displayed simultaneously using an on-line C.R.T. display. On these traces, the consonant part of the syllable could be readily distinguished by eye from that of the vowel, and the consonant was normally represented by a single peak on each trace. Further computer routines were evolved to identify these consonant peaks and extract recognition parameters describing the form of the peaks. Mistakes made by these programmes could be corrected manually from observation of the display. An attempt was then made to identify tho consonant phoneme, using the values of the recognition parameters. The recognition algorithms took the form of modified binary threshold decision trees, and the task of designing these algorithms to fit new data was mostly automated. Separate algorithms were constructed to recognise the utterances of each of the four speakers. For the appropriate speakers, the performances of these algorithms were very similar, about 65% of the utterances being classified correctly, with a further 25% of 'possibly' or tentatively correct identifications. The algorithms were, however, greatly speaker dependant, and performance fell off sharply when the speaker was changed. The performance of the algorithms was independent of the vowel spoken after the consonant sound. For each speaker, satisfactory means were found to identify most of the consonant phonemes except the semi-vowel and nasal sounds. Many similarities could be seen between the four recognition algorithms, and it was concluded that the speaker dependance might be reduced by the use of a different type of recognition algorithm coupled with normalisation of the recognition parameters.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: RC0321 Neuroscience. Biological psychiatry. Neuropsychiatry