Use this URL to cite or link to this record in EThOS: | https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.505787 |
![]() |
|||||
Title: | Informing multisource decoding in robust automatic speech recognition | ||||
Author: | Ma, Ning |
ISNI:
0000 0004 2679 4566
|
|||
Awarding Body: | The University of Sheffield | ||||
Current Institution: | University of Sheffield | ||||
Date of Award: | 2009 | ||||
Availability of Full Text: |
|
||||
Abstract: | |||||
Listeners are remarkably adept at recognising speech in natural multisource environments, while most Automatic Speech Recognition (ASR) technology fails in these conditions. It has been proposed that this human ability is governed by Auditory Scene Analysis (ASA) processes, in which a sound mixture is segregated into perceptual packages, called 'streams', by a combination of bottom-up and top-down processing. A range of small-vocabulary speech recognition experiments are conducted for evaluation. This thesis examines a novel ASR framework based on the ASA account, Speech Fragment ding (SFD). A 'fragment' is a spectro-temporal region where energy from a single sound source dominates. SFD employs techniques developed from knowledge about the auditory system to identify fragments. A decoding process using statistical speech models is applied to the fragment representation to simultaneously identify speech evidence and recognise speech.
|
|||||
Supervisor: | Not available | Sponsor: | Not available | ||
Qualification Name: | Thesis (Ph.D.) | Qualification Level: | Doctoral | ||
EThOS ID: | uk.bl.ethos.505787 | DOI: | Not available | ||
Share: |