Use this URL to cite or link to this record in EThOS:
Title: Eye-speech affect detection for automatic speech recognition
Author: Alhargan, Ashwaq H.
ISNI:       0000 0004 7972 7279
Awarding Body: University of Birmingham
Current Institution: University of Birmingham
Date of Award: 2019
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Thesis embargoed until 01 Jan 2025
Access from Institution:
Human-computer interaction (HCI) is becoming increasingly natural. Machines are now able to recognise faces, to understand individual speech and to converse like a human would. However, they are still far from exhibiting humanlike intelligence. Affects play an important role in interaction, so understanding and responding to them are necessary steps towards more natural HCI. This thesis reports the development and evaluation of affect detection systems suitable for use in real-life HCI applications (e.g. speech-enabled interfaces such as Alexa) using speech and eye movement modalities. A corpus of spontaneous affective responses in these modalities within an interactive virtual gaming environment, designed to elicit different affective states corresponding to the arousal and valence dimensions, was collected. A support vector machine was employed as a classifier to detect the affects elicited from both modalities. Several features of eye movement, namely pupillary response, fixation, saccade and blinking, are assessed for use in affect detection and new pupil response features based on the Hilbert transform are proposed. Acoustic and lexical characteristics of speech are investigated. The detection results suggest that eye movement is superior to speech, with pupillary response features based on Hilbert transform yielding superior performance on the arousal dimension, whereas saccade and fixation features perform better on the valence dimension. The improvement made by combining information from eye movement and speech modalities suggests that the two modalities carry complementary information for affect detection and that both warrant incorporation where feasible. An ASR application integrating affective information from both modalities for affect robustness was investigated. The best performing system uses affective information from eye movements, significantly reducing word error rates compared to the speech modality alone. This work highlights the potential of eye movements as an additional modality to speech to enhance the accuracy of affect detection and facilitate the development of robust affect-aware speech-enabled interfaces.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: Q Science (General) ; T Technology (General)