Multi-level speech timing control.
This thesis describes a model of speech timing, predicting at the syllable level, with
sensitivity to rhythmic factors at the foot level, that predicts segmental durations
by a process of accommodation into the higher-level timing framework.
The model is based on analyses of two large databases of British English speech;
one illustrating the range of prosodic variation in the language, the other illustrating
segmental duration characteristics in various phonetic environments. Designed
for a speech synthesis application, the model also has relevance to linguistic and
phonetic theory, and shows that phonological specification of prosodic variation is
independent of the phonetic realisation of segmental duration. It also shows, using
normalisation of phone-specific timing characteristics, that lengthening of segments
within the syllable is of three kinds: prominence-related, applying more to onset segments;
boundary-related, applying more to coda segments; and rhythm/rate-related,
being more uniform across all component segments.
In this model, durations are first predicted at the level of the syllable from consideration
of the number of component segments, the nature of the rhyme, and the three
types of lengthening. The segmental durations are then constrained to sum to this
value by determining an appropriate uniform quantile of their individual distributions.
Segmental distributions define the range of likely durations each might show
under a given set of conditions; their parameters are predicted from broad-class
features of place and manner of articulation, factored for position in the syllable,
clustering, stress, and finality. Two parameters determine the segmental duration
. pdfs, assuming a Gamma distribution, and one parameter determines the quantile
within that pdf to predict the duration of any segment in a given prosodic context.
In experimental tests, each level produced durations that closely fitted the data
of four speakers of British English, and showed performance rates higher than a
comparable model predicting exclusively at the level of the segment.