Use this URL to cite or link to this record in EThOS:
Title: Analysis of the timing of spoken Korean with application to speech synthesis
Author: Chung, Hyunsong
ISNI:       0000 0001 3550 7125
Awarding Body: University of London
Current Institution: University College London (University of London)
Date of Award: 2002
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
The thesis describes new analysis and modelling of Korean segmental duration. It takes into account contemporary approaches to duration modelling, as used in English and Japanese synthesis to build predictive models of segment duration in context which could be used in Korean language text-to-speech (TTS) systems. It also analyses those models to learn more about which factors and which structures are most important in Korean prosody. The thesis concentrates on the duration modelling of a news-reading speech style; using a corpus of 670 read sentences collected from one speaker of standard Korean. The duration of each segment and its phonological context were extracted from the corpus. Statistical modelling explored the relationship between the context features and the realised duration. Based on previous research on timing, Sums-of-Products models and Classification And Regression Tree (CART) models were applied and evaluated on the data. Objective quality of the modelling was evaluated by root mean squared prediction error (RMSE) and the correlation coefficient between actual and predicted durations in reserved test data. The best performance result was obtained from a CART model with an RMSE of 25.11 ms and a correlation of 0.77; a result which was comparable with other published results on Korean segment durations. Analysis showed that prosodic phrase features have the greatest influence on segment duration, among them, the accentual phrase final position feature. In terms of segmental context, surrounding nasals were shown to have consistent shortening effect, while vowels seemed to be affected by the degree of glottal opening of adjacent consonants. Other segmental effects were less consistent. Perceptual tests show a slight listener preference for durations calculated from a CART model in this thesis compared to durations calculated from a commercial Korean TTS system.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: Linguistics