Use this URL to cite or link to this record in EThOS: https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.775225
Title: Psychoacoustics modelling and the recognition of silence in recorded speech
Author: Wilson, Derek
Awarding Body: Newcastle University
Current Institution: University of Newcastle upon Tyne
Date of Award: 2018
Availability of Full Text:
Access from EThOS:
Access from Institution:
Abstract:
Over many years, a variety of different computer models purposed to encapsulate the essential differences between silence and speech have been investigated; but that notwithstanding, research into a different audio model may provide fresh insight. So, inspired by the unsurpassed human capability to differentiate between silence and speech under virtually any conditions, a dynamic psychoacoustics model, with a temporal resolution of an order of magnitude greater than that of the typical Mel Frequency Cepstral Coefficients model, and which implemented simultaneous masking around the most powerful harmonic in each of 24 Bark frequency bands, was evaluated within a two stage binary speech/silence non-linear classification system. The first classification stage (deterministic) was purposed to provide training data for the second stage (heuristic) - which was implemented using a Deep Neural Network (DNN). It is authoritatively asserted in the Literature - in a context of speech processing and DNNs - that performance improvements experienced with a 'standard' speech corpus do not always generalise. Accordingly, six new test-cases were recorded; and as this corpus implicitly included frequency normalisation it was feasible to assess whether the solution generalised, and it was found that all of the test-cases could be successfully processed by any of the six trained DNNs. In other tests, the performance of the two stage silence/speech classifier was found to exceed that of the silence/speech classifiers discussed in the Literature Review; but it was interesting to note that the Split Sample Technique for neural net training did not always identify the optimal trained network - and to correct this, an additional step in the training process was devised and tested. Overall, the results conclusively demonstrate that the combination of the dynamic psychoacoustics model with the two stage binary speech/silence non-linear classification system provides a viable alternative to existing methods of detecting silence in speech.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.775225  DOI: Not available
Share: