Use this URL to cite or link to this record in EThOS:
Title: Separation of sound sources : a machine audition perspective
Author: Litwic, Lukasz
ISNI:       0000 0004 5370 9320
Awarding Body: University of Surrey
Current Institution: University of Surrey
Date of Award: 2015
Availability of Full Text:
Access from EThOS:
Access from Institution:
Speech separation by machines has been extensively studied for many decades and several algorithms and systems have been proposed. Since the speech separation task for machines is often likened to the speech separation task performed (remarkably well) by the human auditory system several analogies can be found in the proposed systems. This thesis takes a localised view on a few of the aspects of the speech separation task and explores some of the analogies from a machine audition perspective. The first part of the thesis presents algorithms for binaural localisation and separation of speech sources based solely on analysis of the Interaural Phase Difference (IPD) cue. The IPD cue encodes time delay information between two microphones which can be used to establish spatial locations of the sources in the mixture. One well known problem with processing the IPD cue is its periodic nature. This means that a single IPD value can represent several spatial locations of the corresponding source. The phase ambiguity problem has been studied for human auditory processing as well for machines, however, mostly from source localisation perspective. Relatively little attention has been given to phase ambiguity which relates to interaction of the IPDs between the sources present in the mixture. Investigations presented in the thesis explore the use of the IPDs by machines for robust source localisation and separation. Firstly, an algorithm for source localisation is introduced. The algorithm combines the Maximum Likelihood Sample Consensus (MLESAC) based search of line patterns which correspond to speech sources. The search is performed using Cross-phasogram representation of IPDs. Next, the study on the impact of phase ambiguity on separation performance is presented. A source separation algorithm called Localisation based Mask for Source Separation (LOCUS) is introduced. The LOCUS algorithm models the IPDs using Gaussian Mixture Model (GMM). The analysis of the IPDs interaction between different sources is shown to improve initialisation of the GMM and in consequence provided performance gains over the state-of-the art binaural separation methods. The second part of this thesis focuses on using the harmonicity cue for speech separation. The harmonicity is a feature of voiced speech therefore intuitively seems a powerful cue that could enhance separation of speech sources. However, in a multi-speaker scenario segregation of harmonic components is not trivial as it relies heavily on the underlying multi-source pitch determination algorithm. The proposed system uses an approach where speech sources are firstly reconstructed using the LOCUS algorithm and fed into single-source pitch determination algorithm. This gives the opportunity to use well-established single-source pitch determination algorithms which have been known for good robustness and accuracy of provided pitch trajectories. Based on this approach the Pitch based Harmonicity Mask for Source Separation (PRIMUS) algorithm is introduced. The approach is analogous to other separation systems that can be found in the literature however there has been little formal validation of some of the algorithmic choices that need to be considered for such approach. Therefore a detailed review followed by experimental studies of all the stages of the algorithm, from reconstruction of speech sources to calculation of corresponding separation masks, are presented. The final evaluation is done for the PRIMUS and the JANUS (Joint Localisation and Harmonicity Mask for Source Separation) algorithms where the JANUS algorithm computes a set of joint separation masks combining outputs of the LOCUS and the PRIMUS algorithms. The experimental results showed improvements in separation performance that were achieved over the state-of-the art binaural separation methods.
Supervisor: Jackson, P. J. B. Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available