Use this URL to cite or link to this record in EThOS:
Title: Machine learning techniques for holistic computational paralinguistics
Author: Zhang, Yue
ISNI:       0000 0004 7658 6132
Awarding Body: Imperial College London
Current Institution: Imperial College London
Date of Award: 2018
Availability of Full Text:
Access from EThOS:
Access from Institution:
Analysing the voice behind the words represents a central aspect of human communication and is thus the key to intelligent machines such as social robots and virtual assistants. In the realm of Computational Paralinguistics (CP), research has aimed at more natural human-computer interaction by enabling machines to recognise various speaker characteristics, social signals and other non-verbal cues. Shaped by the standard methodology in machine learning, research in CP has devised single-task learning systems for recognising dozens of speaker attributes. However, human-like machine perception demands the holistic analysis of paralinguistic phenomena, learning how these are interrelated and using this knowledge to gain a better understanding of the whole context. Thus, in this thesis, two major challenges in current research are addressed: (i) Overcoming the bottleneck of data annotation to meet the urgency for obtaining labelled data; (ii) Exploiting task relatedness in learning multiple tasks at the same time. To tackle the first problem, efficient data annotation methods based on active learning and semi-supervised learning are proposed, with the aim to leverage the vast amount of unlabelled data for training while reducing manual labelling effort. In determining whether or not and how many human annotators are required in regard to label ambiguity, a generic cooperative learning framework is introduced, using measures of model confidence for human-machine arbitration and an early-stopping criterion based on interrater agreement. Techniques of this kind are particularly important for subjective tasks that do not come with ground truth labels. The effectiveness of the proposed algorithms is demonstrated on the tasks of speech emotion recognition and prosody estimation. To enable holistic analysis in meeting the second thesis objective, it is important to know which related tasks may mutually benefit from joint learning and thus to avoid negative transfer. Building upon feature relevance analysis and non-metric dimensional scaling, a data-driven approach is proposed to construct a paralinguistic atlas, in which task similarities are transformed into distance measures in a two-dimensional space. On this basis, various multi-task learning methods are applied to related tasks to improve generalisation performance by means of data aggregation, feature relevance analysis, auxiliary labels, and shared representation learning. In particular, two approaches have been conceived to handle the prevalence of single-label databases, which represents a main obstacle for holistic analysis. One possibility to exploit task interdependencies is to generate multi-label datasets. This is achieved with the proposed cross-task labelling method, which automatically completes the missing target labels, e. g., when performing joint learning of deception detection and sincerity classification. Alternatively, the single-label datasets at hand can be fed into a multi-task deep neural network, which learns different tasks in parallel through shared hidden layers and separate output layers. This method also effectively addresses the 'universality' aspect of the holistic concept, i. e., recognising various paralinguistic patterns such as different emotion expressions in a simultaneous way.
Supervisor: Schuller, Bjorn W. Sponsor: European Union
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral