Title:
|
Speech recognition in noise using weighted matching algorithms
|
This thesis investigates the problem of automatic speech recognition in noise (additive and convolutional) by the development of Weighted Matching algorithms (WMA). The WMA approach relies on the fact that additive noise corrupts some segments of the speech signal more severely than others. As a result, WMA revises the classical concept of acoustic pattern matching in order to include the segmental signal to noise ratio (SNR) frame-by-frame. The problem of end-point detection is also addressed and a method based on autoregressive analysis of noise is also proposed for robust speech pulse detection. The technique is shown to be effective in increasing the discriminability between the speech signal and background noise. Modified versions of the Dynamic Time Warping (DTW) and Hidden Markov Model (HMM) algorithms are proposed and tested in combination with reliability in noise cancelling weighting firstly using a novel noise cancelling neural net (LIN-Lateral Inhibition Neural Net) and then spectral subtraction (SS). The reliability in noise cancelling is a function of the local SNR and tries to measure the reliability of the information provided by the noise cancelling technique. A model for additive noise is proposed with the suggestion that the hidden clean signal information should be treated as a stochastic variable. This model is applied to estimate the uncertainty in noise cancelling using SS in a Mel filter bank, and this uncertainty (inverse of reliability) is employed to compute the weighting coefficient to be used in the modified DTW or Viterbi (HMM) algorithms. This uncertainty (in the form of a variance) is mainly caused by the lack of knowledge about the phase difference between noise and clean signals. The model for additive noise also suggests that SS could be defined as being the expected value of the hidden clean signal energy in the log domain given the noisy energy and the noise energy estimation. The reliability in noise cancelling weighting is tested in an isolated word recognition task (digits) with several types of noise, and is shown to substantially reduce the error rate when SS is used to remove the additive noise using poor estimation of the corrupting signal. The weighted Viterbi (HMM) algorithm is compared and combined with state duration modelling. It is shown that weighting the time varying signal information requires only a low computational load and leads to better results than the introduction of temporal constraints in the recognition algorithm. In combination with temporal constraints, the weighted Viterbi algorithm results in a high recognition accuracy at moderate SNR's without an accurate noise model.
|