Use this URL to cite or link to this record in EThOS:
Title: Visual speech enhancement and its application in speech perception training
Author: Alghamdi, Najwa
ISNI:       0000 0004 7225 2805
Awarding Body: University of Sheffield
Current Institution: University of Sheffield
Date of Award: 2017
Availability of Full Text:
Access from EThOS:
Access from Institution:
This thesis investigates methods for visual speech enhancement to support auditory and audiovisual speech perception. Normal-hearing non-native listeners receiving cochlear implant (CI) simulated speech are used as ‘proxy’ listeners for CI users, a proposed user group who could benefit from such enhancement methods in speech perception training. Both CI users and non-native listeners share similarities with regards to audiovisual speech perception, including increased sensitivity to visual speech cues. Two enhancement methods are proposed: (i) an appearance based method, which modifies the appearance of a talker’s lips using colour and luminance blending to apply a ‘lipstick effect’ to increase the saliency of mouth shapes; and (ii) a kinematics based method, which amplifies the kinematics of the talker’s mouth to create the effect of more pronounced speech (an ‘exaggeration effect’). The application that is used to test the enhancements is speech perception training, or audiovisual training, which can be used to improve listening skills. An audiovisual training framework is presented which structures the evaluation of the effectiveness of these methods. It is used in two studies. The first study, which evaluates the effectiveness of the lipstick effect, found a significant improvement in audiovisual and auditory perception. The second study, which evaluates the effectiveness of the exaggeration effect, found improvement in the audiovisual perception of a number of phoneme classes; no evidence was found of improvements in the subsequent auditory perception, as audiovisual recalibration to visually exaggerated speech may have impeded learning when used in the audiovisual training. The thesis also investigates an example of kinematics based enhancement which is observed in Lombard speech, by studying the behaviour of visual Lombard phonemes in different contexts. Due to the lack of suitable datasets for this analysis, the thesis presents a novel audiovisual Lombard speech dataset recorded under high SNR, which offers two, fixed head-pose, synchronised views of each talker in the dataset.
Supervisor: Maddock, Steve ; Barker, Jon ; Brown, Guy J. Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available