Use this URL to cite or link to this record in EThOS:
Title: Modelling segmental variability for automatic speech recognition
Author: Holmes, Wendy J.
ISNI:       0000 0001 3580 8702
Awarding Body: University of London
Current Institution: University College London (University of London)
Date of Award: 1997
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
This thesis describes work developing an approach to automatic speech recognition which incorporates a more realistic underlying model of speech than the currently-successful technique of hidden Markov models (HMMs). Whereas HMM states are associated with individual (frame-based) feature vectors, the new models represent sequences or "segments" of features. The segment models described in this thesis are referred to as "segmental HMMs", and incorporate the concept of trajectories to describe how features change over time together with a novel representation of segmental variability. Extra-segmental variability between different examples of a sub-phonemic speech segment is modelled separately from intra- segmental variability within any one example. The extra-segmental component of the model is represented in terms of variability in the trajectory parameters, and can be regarded as providing a prior constraint on the possible observation sequences that can be generated by the model. The work which forms the basis for the thesis has concentrated on investigating the representation of the two types of variability in relation to characteristics of speech data and to recognition performance. Experiments have demonstrated that a segmental HMM can give improvements in recognition performance, both for a connected-digit recognition task and for a phonetic classification task. However, the model only worked well when the modelling assumptions were a reasonable approximation to the characteristics of real speech. Firstly, it was important that both the extra- and intra-segment model distributions were fairly accurate across all segment durations, in order for the two types of probability to balance appropriately in recognition tasks. In addition, the trajectory descriptions needed to be reasonably accurate, as demonstrated by the finding that segmental HMMs of sub-phone speech segments gave performance advantages when using a linear trajectory representation but not for a static trajectory description. The thesis concludes with a discussion of the experimental findings in relation to several design issues for developing a segmental model that truly reflects the characteristics of real speech signals.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: Pattern recognition & image processing