Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.661116
Title: Estimating articulatory parameters from the acoustic speech signal
Author: Richmond, Korin
Awarding Body: University of Edinburgh
Current Institution: University of Edinburgh
Date of Award: 2002
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
Abstract:
Among the data-driven models which have been employed in previous studies, the feedforward multilayer perceptron (MLP) in particular has been used several times with promising results. Researchers have cited advantages in terms of memory requirement and execution speed as a significant factor motivating their use. Furthermore, the MLP is well known as a universal function approximator; an MLP of suitable form can in theory represent any arbitrary mapping function. Therefore, using an MLP in conjunction with the relatively large quantities of acoustic-articulatory data arguably represents a promising and useful first research step for the current thesis, and a significant part of this thesis is occupied with doing this. Having demonstrated an MLP which performs well enough to provide a reasonable baseline, we go on to critically evaluate the suitability of the MLP for the inversion mapping. The aim is to find ways to improve modelling accuracy further. Considering what model of the target articulatory domain is provided in the MLP is key in this respect. It has been shown that the outputs of an MLP trained with the sum-of-squares error function approximate the mean of the target data points conditioned on the input vector. In many situations, this is an appropriate and sufficient solution. In other cases, however, this conditional mean is an inconveniently limiting model of data in the target domain, particularly for ill-posed problems where the mapping may be multi-valued. Substantial evidence exists which shows that multiple articulatory configurations are able to produce the same acoustic signal. This means that a system intended to map from a point in acoustic space can be faced with multiple candidate articulatory configurations. Therefore, despite the impressive ability of the MLP to model mapping functions, it may prove inadequate in certain respects for performing the acoustic-to-articulatory inversion mapping. Mixture density networks (MDN) provide a principled method to model arbitrary probability density functions over the target domain, conditioned on the input vector. In theory, therefore, the MDN offers a superior model of the target domain compared to the MLP. We hypothesise that this advantage will prove beneficial in the case of the acoustic-to-articulatory inversion mapping. Accordingly, this thesis aims to test this hypothesis and directly compare the performance of MDN with MLP on exactly the same acoustic-to-articulatory inversion task.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.661116  DOI: Not available
Share: