Use this URL to cite or link to this record in EThOS: https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.793555
Title: Particle flow PHD filtering for audio-visual multi-speaker tracking
Author: Liu, Yang
ISNI:       0000 0004 4692 2086
Awarding Body: University of Surrey
Current Institution: University of Surrey
Date of Award: 2019
Availability of Full Text:
Access from EThOS:
Access from Institution:
Abstract:
Tracking an unknown and time-varying number of targets (e.g., speakers) in indoor environments using audio-visual (AV) modalities has received increasing interest in numerous fields including video conferencing, individual speaker discrimination, and human-computer interaction. The audio-visual sequential Monte Carlo probability hypothesis density (AV-SMCPHD) filter is a popular baseline for multi-target tracking, offering an elegant framework for fusing audio-visual information and dealing with a varying number of speakers. However, the performance of this filter can be adversely affected by the weight degeneracy problem, where the weights of most of the particles may become very small, while only few remain significant, during the iteration of the algorithm. To address this issue, this thesis proposes the AV-SMC-PHD filter by incorporating particle flows defined in terms of the ordinary differential equation and the Fokker-Planck equation. This thesis considers both zero and non-zero diffusion particle flows (ZPF/NPF), and developed two new algorithms, AV-ZPF-SMC-PHD and AV-NPFSMC-PHD, where the speaker states from the previous frames are also considered for particle relocation. The particle flow migrates particles from the prior distribution to the posterior distribution, using a homotopy function which defines the flow in synthetic time. The proposed methods can mitigate the particle degeneracy of the AV-SMC-PHD filter and improve tracking accuracy. Another issue is that the performance of the multi-speaker tracking algorithms is often degraded by mis-detection and clutter in the measurements. To address this issue, this thesis proposes an intensity particle flow (IPF) SMC-PHD filter based on the intensity function derived from the measurements, informed by the clutter density and the detection probability. The IPF-SMC-PHD filter improves tracking accuracy, but induces a high computational overhead, due to the requirement for computing the sum of the likelihood intensity functions and the third-order differentiation of the likelihood density. As a result, the computational complexity of IPF is proportional to the cube of the number of measurements. To address this problem, this thesis proposes a labelled particle flow (LPF) algorithm where particle labels are estimated from the measurements from multiple sensors and then used to update particles and estimate speaker states. Since the LPF only uses the first differentiation of the likelihood density and replaces the clustering step by the sum of particle states, LPF offers a higher computational efficiency as compared with other particle flow methods where a clustering method is often used to estimate the target states. All the proposed methods are extensively evaluated using different datasets, such as AV16.3, AVDIAR and CLEAR. The results show that the weight degeneracy problem has been mitigated by our proposed methods which offer higher tracking accuracy than the baseline methods in a variety of scenarios such as occlusion and rapid movements of the speakers.
Supervisor: Wang, Wenwu ; Hilton, Adrian Sponsor: University of Surrey ; BBC ; China Scholarship Council (CSC)
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.793555  DOI:
Share: