Use this URL to cite or link to this record in EThOS: | https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.507371 |
![]() |
|||||||
Title: | Multimodal methods for blind source separation of audio sources | ||||||
Author: | Naqvi, Syed Mohsen Raza |
ISNI:
0000 0004 2676 3006
|
|||||
Awarding Body: | Loughborough University | ||||||
Current Institution: | Loughborough University | ||||||
Date of Award: | 2009 | ||||||
Availability of Full Text: |
|
||||||
Abstract: | |||||||
The enhancement of the performance of frequency domain convolutive blind source separation (FDCBSS) techniques when applied to the problem of separating audio sources recorded in a room environment is the focus of this thesis. This challenging application is termed the cocktail party problem and the ultimate aim would be to build a machine which matches the ability of a human being to solve this task. Human beings exploit both their eyes and their ears in solving this task and hence they adopt a multimodal approach, i.e. they exploit both audio and video modalities. New multimodal methods for blind source separation of audio sources are therefore proposed in this work as a step towards realizing such a machine. The geometry of the room environment is initially exploited to improve the separation performance of a FDCBSS algorithm. The positions of the human speakers are monitored by video cameras and this information is incorporated within the FDCBSS algorithm in the form of constraints added to the underlying cross-power spectral density matrix-based cost function which measures separation performance.
|
|||||||
Supervisor: | Not available | Sponsor: | Not available | ||||
Qualification Name: | Thesis (Ph.D.) | Qualification Level: | Doctoral | ||||
EThOS ID: | uk.bl.ethos.507371 | DOI: | Not available | ||||
Share: |