Use this URL to cite or link to this record in EThOS:
Title: Blind convolutive stereo speech separation and dereverberation
Author: Alinaghi, Atiyeh
ISNI:       0000 0004 6061 6835
Awarding Body: University of Surrey
Current Institution: University of Surrey
Date of Award: 2017
Availability of Full Text:
Access from EThOS:
Access from Institution:
In real environments, microphones record not only the target speech signal, but also other available sources, the room acoustic effects and background noise. Hence, for many applications, including automatic speech recognition, hearing aids, cochlear implants, and human machine interaction, it is desirable to extract the target speech from the noisy convolutive mixture of multiple sources. There are two major approaches for speech source separation. One group of algorithms is known as blind source separation (BSS) and is based on the statistical properties of the signals. The other group is known as computational auditory scene analysis (CASA) which is inspired by the human auditory system. Using either approach, a voice may be extracted by applying a mask to a time-frequency representation of the noisy reverberant mixted signal. In this thesis, these two groups of techniques are studied, compared and combined based on two state of the art algorithms. For the BSS approach, a frequency-dependent mixing vector (MV) is estimated and exploited to form a probabilistic mask. In the CASA approach, binaural cues such as interaural time difference (ITD) and interaural level difference (ILD) are calculated and applied to estimate a different probabilistic mask. Since the BSS approach performance is poor in high reverberation and CASA approach fails to separate the sources close to each other, experiments were conducted to test to their combination. The results show significant improvement in source separation under various conditions. However, the mechanism for this improvement was not clear at the first glance. The methods are studied and show that the MV based algorithm works better when the sources are close to each other. On the other hand, binaural cues yield better performance in the presence of reverberation. Consequently, these two major approaches give complementary improvements under adverse conditions. High reverberation still degrades the performance of our source separation algorithm. Therefore, the precedence effect was considered as a means to tackle reverberation. In our algorithm, time-frequency regions dominated by direct sound are identified based on the interaural coherence. The results demonstrate a further significant improvement in performance.
Supervisor: Jackson, P. J. Sponsor: Centre for Vision, Speech and Signal Processing (CVSSP)
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available