Use this URL to cite or link to this record in EThOS:
Title: Accent modelling and adaptation in automatic speech recognition
Author: Humphries, J. J.
Awarding Body: University of Cambridge
Current Institution: University of Cambridge
Date of Award: 1998
Availability of Full Text:
Full text unavailable from EThOS.
Please contact the current institution’s library for further details.
Automatic speech recognition technology has advanced considerably and today's systems are sufficiently fast, affordable and robust to be useful in a wide range of applications. As the scope of these products increases, so does the range of people using them. The diversity of speaker accents which this brings poses a serious problem for existing technology. Speech recognition systems are generally trained on a specific accent group, such as Standard British English (RP). This work demonstrates that the performance of such systems deteriorates significantly when the accent of the incoming speech is different to that represented by the recogniser (typically more than a 200% increase in word error rate). It is shown that this is attributable to both acoustic and phonological differences between accents. Pronunciation modelling can help overcome phonological differences and a new scheme is described which builds upon previous work in this area to give a fully automated method for capturing pronunciation variations. This method requires no linguistic intervention and works with modern large vocabulary continuous speech recognisers. An existing (e.g. British) phone-loop recogniser transcribes speech from the new accent region (e.g. American) and the resulting phonetic transcription compared to standard (British) pronunciations. The phonological differences are then recorded, along with their phonetic context and confidence scores. A statistical pronunciation model of the new accent can then be produced by clustering these observations using binary decision trees. From these, a set of multiple pronunciations, appropriate to the new accent and with associated probabilities, can be generated and used within the original speech recogniser. This technique has been applied to the speech recognition problem in a range of adaptation and re-training scenarios, using American, British and non-native English speech data, and has been shown to reduce recogniser word error rates by up to 20%. Pronunciation effects captured in an automatically generated model of American English are shown to agree well with linguistic theory, and the similarity of pronunciations synthesised from this model to canonical American pronunciations is shown. A scheme for the integration of pronunciation adaptation with acoustic adaptation (specially MLLR) has also been presented and shown to be effective in producing reductions in recogniser word error rates of as much as 40%. The value of syllable and cross-word information in the accent model was also evaluated.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available