Use this URL to cite or link to this record in EThOS: https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.800981
Title: Sound event detection with weakly labelled data
Author: Kong, Qiuqiang
Awarding Body: University of Surrey
Current Institution: University of Surrey
Date of Award: 2020
Availability of Full Text:
Access from EThOS:
Access from Institution:
Abstract:
Sound event detection (SED) is a problem to detect the onset and offset times of sound events in an audio recording. SED has many applications in both academia and industry, such as multimedia information retrieval and monitoring domestic and public security. However, compared to speech signal processing that have been researched for many years, the classification and detection of general sounds has not been researched much until recent years. One limitation of the study on audio classification and sound event detection is that there have been limited datasets public available until the release of the release of the detection and classification of acoustic scenes and events (DCASE) dataset. The DCASE dataset consists of data for acoustic scene classification (ASC), audio tagging (AT) and sound event detection. ASC and AT are tasks to design systems to predict pre-defined labels in an audio clip. SED is a task to design systems to predict both the presence or absence of sound events in an audio clip as well as the onset and offset times of the sound events. One difficulty of the audio classification and SED task is that many datasets such as the DCASE dataset are weakly labelled. That is, only the presence or absence of sound events in an audio clip is known, without knowing the onset and offset annotations of the sound events. This thesis focused on solving the audio tagging and sound event detection problem using only weakly labelled data. This thesis proposed attention neural networks to solve the general weakly labelled AT and SED problem. The attention neural networks can automatically learn to attend to important segments and ignore silence and irrelevant segments in an audio clip. We developed a set of weak learning methods for AT and SED using attention neu- Abstract 3 ral networks. The proposed methods have achieved a state-of-the-art performance in audio tagging and sound event detection.
Supervisor: Plumbley, Mark Sponsor: China Scholarship Council (CSC)
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.800981  DOI:
Share: