Use this URL to cite or link to this record in EThOS:
Title: Single channel audio separation using deep neural networks and matrix factorizations
Author: Wu, Di
ISNI:       0000 0004 7429 9826
Awarding Body: Newcastle University
Current Institution: University of Newcastle upon Tyne
Date of Award: 2017
Availability of Full Text:
Access from EThOS:
Access from Institution:
Source Separation has become a significant research topic in the signal processing community and the machine learning area. Due to numerous applications, such as automatic speech recognition and speech communication, separation of target speech from the mixed signal is of great importance. In many practical applications, speech separation from a single recorder is most desirable from an application standpoint. In this thesis, two novel approaches have been proposed to address this single channel audio separation problem. This thesis first reviews traditional approaches for single channel source separation, and later elicits a generic approach, which is more capable of feature learning, i.e. deep graphical models. In the first part of this thesis, a novel approach based on matrix factorization and hierarchical model has been proposed. In this work, an artificial stereo mixture is formulated to provide extra information. In addition, a hybrid framework that combines the generalized Expectation-Maximization algorithm with a multiplicative update rule is proposed to optimize the parameters of a matrix factorization based approach to approximatively separate the mixture. Furthermore, a hierarchical model based on an extreme learning machine is developed to check the validity of the approximately separated sources followed by an energy minimization method to further improve the quality of the separated sources by generating a time-frequency mask. Various experiments have been conducted and the obtained results have shown that the proposed approach outperforms conventional approaches not only in reduction of computational complexity, but also the separation performance. In the second part, a deep neural network based ensemble system is proposed. In this work, the complementary property of different features are fully explored by ‘wide’ and ‘forward’ ensemble system. In addition, instead of using the features learned from the output layer, the features learned from the penultimate layer are investigated. The final embedded features are classified with an extreme learning machine to generate a binary mask to separate a mixed signal. The experiment focuses on speech in the presence of music and the obtained results demonstrated that the proposed ensemble system has the ability to explore the complementary property of various features thoroughly under various conditions with promising separation performance.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available