Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.633225
Title: An acoustic model for speech recognition with an articulatory layer and non-linear articulatory-to-acoustic mapping
Author: Lo, Boon Hooi
Awarding Body: University of Birmingham
Current Institution: University of Birmingham
Date of Award: 2004
Availability of Full Text:
Access from EThOS:
Abstract:
This thesis presents an extended hidden Markov Model (HMM), namely the linear/non-linear multi-level segmental hidden Markov model (linear/non-linear MSHMM). In the MSHMM framework, the relationship between symbolic and acoustic representations of a speech signal is regulated by an intermediate, articulatory-based layer. Such an approach has many potential advantages for speech pattern processing. By modelling speech dynamics directly in an articulatory domain, it may be possible to characterise the articulatory phenomena which give rise to variability in speech. The intermediate representations are based on the first three formant frequencies. The speech dynamics in the formant representation of each segment are modelled as fixed linear trajectories which characterise the distribution of formant frequencies. These trajectories are mapped into the acoustic features space by set of one or more non-linear mappings. Hence, comes the name linear/non-linear MSHMM. This thesis describes work developing a non-linear transformation approach using a nonlinear Radial Basis Function (RBF) network for the articulatory-to-acoustic mapping. A RBF network consists of a number of hidden units and mapping weights for linear transform component of the network. Each hidden unit is associated with a 'Gaussian-like' distribution. The thesis presents the training and optimisation processes for the parameters of the RBF network. The linear/non-linear MSHMMs, which form the basis for the thesis, are incorporated into an automatic speech recognition system. Gradient descent process is used to find the optimal parameters of the linear trajectory models during Viterbi training process. The phone classification experiments are presented for monophone MSHMMs using TEVflT database. The linear/non-linear MSHMM is compared with the linear/linear MSHMM, where both the model of dynamics and the articulatory-to-acoustic mappings are linear. The comparison results show no statistically significant difference in performance between these two models.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.633225  DOI: Not available
Share: