Use this URL to cite or link to this record in EThOS:
Title: Speaker tracking in a joint audio-video network
Author: D'Arca, Eleonora
ISNI:       0000 0004 5989 1936
Awarding Body: Heriot-Watt University
Current Institution: Heriot-Watt University
Date of Award: 2015
Availability of Full Text:
Access from EThOS:
Access from Institution:
Situational awareness is achieved naturally by the human senses of sight and hearing in combination. System-level automatic scene understanding aims at replicating this human ability using cooperative microphones and cameras. In this thesis, we integrate and fuse audio and video signals at different levels of abstractions to detect and track a speaker in a scenario where people are free to move indoors. Despite the low complexity of the system, which consists of just 4 microphones pairs and 1 camera, results show that the overall multimodal tracker is more reliable than single modality systems, tolerating large occlusions and cross-talking. The system evaluation is performed on both single modality and multimodality tracking. The performance improvement given by the audio-video integration and fusion is quantified in terms of tracking precision and accuracy as well as speaker diarisation error rate and precision-recall recognition metrics. We evaluate our results vs. the closest works: a 56% improvement on audio only sound source localisation computational cost and an 18% increment on the speaker diarisation error rate over a speaker-only unit is achieved.
Supervisor: Robertson, Neil ; Hopgood, James Sponsor: EPSRC
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available