Use this URL to cite or link to this record in EThOS:
Title: Spectral subtraction for speech enhancement and automatic speech recognition
Author: Evans, N. W. D.
Awarding Body: University of Wales Swansea
Current Institution: Swansea University
Date of Award: 2004
Availability of Full Text:
Access from EThOS:
The contributions made in this thesis relate to an extensive investigation of spectral subtraction in the context of speech enhancement and noise robust automatic speech recognition (ASR) and the morphological processing of speech spectrograms. Three sources of error in a spectral subtraction approach are identified and assessed with ASR. The effects of phase, cross-term component and spectral magnitude errors are assessed in a common spectral subtraction framework. ASR results confirm that, except for extreme noise conditions, phase and cross-term component errors are relatively negligible compared to noise estimate errors. A topology classifying approaches to spectral subtraction into power and magnitude, linear and non-linear spectral subtraction is proposed. Each class is assessed and compared under otherwise identical experimental conditions. These experiments are thought to be the first to assess the four combinations under such controlled conditions. ASR results illustrate a lesser sensitivity to noise over-estimation for non-linear approaches. With a view to practical systems, different approaches to noise estimation are investigated. In particular approaches that do not require explicit voice activity detection are assessed and shown to compare favourably to the conventional approach, the latter requiring explicit voice activity detection. Following on from this finding a new computationally efficient approach to noise estimation that does not require explicit voice activity detection is proposed. Investigations into the fundamentals of spectral subtraction highlight the limitation of noise estimates: statistical estimates obtained from a number of analysis frames lead to relatively poor representations of the instantaneous values. To ameliorate this situation, estimates from neighbouring, lateral frequencies are used to complement within bin (from the same frequency) statistical approaches. Improvements are found to be negligible. However, the principle of these lateral estimates lead naturally to the final stage of the work presented in this thesis, that of morphologically filtering speech spectrograms. This form of processing is examined for both synthesised and speech signals and promising ASR performance is reported. In 2000 the Aurora 2 database was introduced by the organisers of a special session at Eurospeech 2001 entitled ‘Noise Robust Recognition’, aimed at providing a standard database and experimental protocols for the assessment of noise robust ASR. This facility, when it became available, was used for the work described in this thesis.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available