Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.639110
Title: Non-linear prediction for speech processing
Author: Stevens, D. A.
Awarding Body: University College of Swansea
Current Institution: Swansea University
Date of Award: 1995
Availability of Full Text:
Access from EThOS:
Abstract:
For over 20 years linear prediction has been one of the most widely used methods for analysing speech signals. Linear predictors have been used to model the vocal tract in all areas of speech processing from speech recognition to speech synthesis. However, Teager showed as early as 1980 by measuring the flow within the vocal tract during the pronunciation of a vowel sound, that the vocal tract is a non-linear system. As such the standard linear predictors are unable to model all the vocal tract information available in the speech signal. This work looks at replacing or complementing the standard linear models with non-linear ones in order to improve the modelling of the vocal tract. Several different methods of both generating and implementing non-linear models of the vocal tract are assessed to see how much improvement in prediction can be achieved by using non-linear models, either in place of, or complementing, the standard linear models. Two basic approaches to non-linear prediction have been used. The first of these is to configure a multi-layered perceptron (MLP) as a non-linear predictor and then to train the MLP to predict the speech signal. The second method is known as a split function approach as it effectively splits the overall predictor function into smaller sub-functions each of which requires a less complex predictor function than the whole. This second method uses a classification stage to determine what type of speech is present and then uses a separate predictor for each of the classifications. Initial results using a single MLP predictor proved ineffective, returning gains of 0.1 to 0.3 dB in excess of the standard LPC. This is thought to be due to an inability of the networks used to model the full dynamic complexity of the speech signal. However with the split function predictors it is shown that relatively high prediction gains can be achieved using a few simple sub-functions. With four linear sub-functions gains of 2.1 dB have been achieved over the standard LPC.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.639110  DOI: Not available
Share: