Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.504557
Title: Audio Source Separation by Time-frequency Masking
Author: Nesbit, Andrew Luke
Awarding Body: Queen Mary, University of London
Current Institution: Queen Mary, University of London
Date of Award: 2008
Availability of Full Text:
Access through EThOS:
Abstract:
Audio source separation is a particularly interesting problem when the number of mixture channels is less than the number of sources. Our motivation for studying this is that recorded stereo music signals can often be approximated by the two-channel case. Such mixtures often have a high degree of overlapping partial frequencies and are especially challenging for standard techniques. oWe attempt to solve the problem by time-frequency masking methods, using transforms which give sparse signal representations. Our first contribution is to compare binary time-frequency masking using fixed-basis transforms, such as the short-time Fourier transform, with a new, computationally efficient method using adaptive lapped orthogonal transforms to maximise the energy of the estimated source coefficients. This assumes prior knowledge of the mixing structure (the semi-blind case). Experiments demonstrate that adaptive transforms may sometimes give better performance than fixed-basis transforms. Secondly, we describe how adaptive windowing can cause distortions in the estimated sources due to the masking process. Minimising these distortions is a trade-off between minimising blocking artifacts and minimising timedomain aliasing errors. Experiments indicate that excessive blocking artifacts decrease performance more than time-domain aliasing effects do. We propose various modifications to the transforms and mask estimation techniques to reduce these distortions. Thirdly, we describe statistically motivated extensions to binary masking techmques whIch allow more than one soUrce to tJe dCtLVe at allY tlIBEfrequency index. We also develop oracle estimators to determine empirical upper performance bounds, assuming that we have reference sources available. Oracle experiments indicate that excellent potential performance is possible compared to semi-blind methods, particularly for adaptive transforms or when more than one active source coefficient is allowed to be active. Finally, we conclude by outlining future research directions for (semi-)blind methods to approach the potential performance gains indicated by the oracle methods and to increase the applicability of our methods.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.504557  DOI: Not available
Share: