Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.655520
Title: Improving the perceptual quality of single-channel blind audio source separation
Author: Stokes, Tobias W.
ISNI:       0000 0004 5365 3304
Awarding Body: University of Surrey
Current Institution: University of Surrey
Date of Award: 2015
Availability of Full Text:
Access through EThOS:
Access through Institution:
Abstract:
Given a mixture of audio sources, a blind audio source separation (BASS) tool is required to extract audio relating to one specific source whilst attenuating that related to all others. This thesis answers the question “How can the perceptual quality of BASS be improved for broadcasting applications?” The most common source separation scenario, particularly in the field of broadcasting, is single channel, and this is particularly challenging as a limited set of cues are available. Broadcasting also requires that a source separator is automated, capable of handling non-stationary, reverberant mixtures and able to separate an unknown number of sources. In the single-channel case, the time- frequency mask is common as a method of separation. However, this process produces artefacts in the separated audio. The perceptual evaluation for audio source separation (PEASS) toolkit represents an efficient way to generate a multi-dimensional measure of perceptual quality. Initial experimental work, using ideal target and interferer estimates, uses PEASS to test variations on the ideal binary mask and shows continuous masks are perceptually better than binary while identifying a trade-off between artefacts and interferer suppression. To explore the optimisation of this trade-off, a series of sigmoidal functions are used to map target-to-mixture ratios to mask coefficients. This leads to a mask, with less target-to-mixture based discrimination than those typically found in literature, being identified as the optimum. Further experiments applying offsets, hysteresis, smoothing and frequency-dependency to the mask do not show any benefit in audio quality. The optimal sigmoidal mask is demonstrated to also be superior under non-ideal conditions using a non-negative matrix factorisation algorithm to produce the estimates. A final listening test compares the outputs of binary, ratio and optimal sigmoidal masks concluding that listeners prefer the ratio mask to the sigmoidal mask and both continuous masks to the binary mask.
Supervisor: Brookes, Tim; Hummersone, C. Sponsor: Engineering and Physical Sciences Research Council ; British Broadcasting Corporation Research and Development
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.655520  DOI: Not available
Share: