Use this URL to cite or link to this record in EThOS:
Title: Audiovisual discrimination between laughter and speech
Author: Petridis, Stavros
ISNI:       0000 0004 2718 4188
Awarding Body: Imperial College London
Current Institution: Imperial College London
Date of Award: 2012
Availability of Full Text:
Full text unavailable from EThOS.
Please contact the current institution’s library for further details.
Laughter is clearly an audiovisual event, consisting of the laughter vocalisation and involving facial activity around the mouth. Past research on automatic laughter classification has focused mainly on audio-based approaches. In this thesis we integrate the information from audio and video channels and show that this fusion may lead to improved performance over unimodal approaches. We investigated different types of audiovisual fusion, temporal modelling and feature sets in order to find the best combination. A novel approach to combine audio and visual information based on prediction is also proposed, which explicitly models spatial and temporal relationship between audio and visual features. Experiments are presented both on matched training and test conditions, using subject-independent cross validation in one database, and unmatched conditions using 6 databases. This presents a challenging situation which is rarely addressed in the literature. Comparison of the different fusion approaches is performed on these databases, confirming that the prediction-based method proposed usually performs better than standard fusion methods. The lack of suitable data is a major obstacle in studying laughter so we introduce a new publicly available audiovisual database suitable for studying laughter. It contains 22 subjects which were recorded while watching stimulus material, by two microphones, a video camera and a thermal camera. An analysis of the errors of the audio, video and audiovisual classifiers is also performed in terms of gender, language, laughter types and noise levels in order to get an insight of when visual information helps. Finally, results on the first attempt to discriminate two types of laughter, voiced and unvoiced, in an audiovisual way are presented. Overall, it is demonstrated that in most cases the addition of visual information to audio leads to improved performance in laughter-vs-speech discrimination and audiovisual fusion is really beneficial as the audio noise levels increase.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available