Use this URL to cite or link to this record in EThOS:
Title: Recognition of sound sources and acoustic events in music and environmental audio
Author: Giannoulis, Dimitrios
ISNI:       0000 0004 5360 3948
Awarding Body: Queen Mary, University of London
Current Institution: Queen Mary, University of London
Date of Award: 2014
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Thesis embargoed until 01 Jun 2100
Access from Institution:
Hearing, together with other senses, enables us to perceive the surrounding world through sensory data we constantly receive. The information carried in this data allow us to classify the environment and the objects in it. In modern society the loud and noisy acoustic environment that surrounds us makes the task of "listening" quite challenging, probably more so than ever before. There is a lot of information that has to be filtered to separate the sounds we want to hear at from unwanted noise and interference. And yet, humans, as other living organisms, have a remarkable ability to identify and track the sounds they want, irrespectively of the number of them, the degree of overlap and the interference that surrounds them. To this day, the task of building systems that try to "listen" to the surrounding environment and identify sounds in it the same way humans do is a challenging one, and even though we have made steps towards reaching human performance we are still a long way from building systems able to identify and track most if not all the different sounds within an acoustic scene. In this thesis, we deal with the tasks of recognising sound sources or acoustic events in two distinct cases of audio – music and more generic environmental sounds. We reformulate the problem and redefine the task associated with each case. Music can also be regarded as a multisound source environment where the different sound sources (musical instruments) activate at different times, and the task of recognising the musical instruments is then a central part of the more generic process of automatic music transcription. The principal question we address is whether we could develop a system able to recognise musical instruments in a multi-instrument scenario where many different instruments are active at the same time, and for that we draw influence from human performance. The proposed system is based on missing feature theory and we find that the method is able to retain high performance even under the most adverse of listening conditions (i.e. low signal-to-noise ratio). Finally, we propose a technique to fuse this system with another that deals with automatic music transcription in an attempt to inform and improve the overall performance. For a more generic environmental audio scene, things are less clear and the amount of research conducted in the area is still scarce. The central issue here, is to formulate the problem of sound recognition, define the subtasks and associated difficulties. We have set up and run a worldwide challenge and created datasets that is intended to enable researchers to perform better quality research in the field. We have also developed proposed systems that could serve as baseline techniques for future research and also compared existing state-of-the-art algorithms to one another, and also against human performance, in an effort to highlight strengths and weaknesses of existing methodologies.
Supervisor: Not available Sponsor: Queen Mary University of London
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: Electronic Engineering ; Hearing ; Music analysis ; Acoustics