Use this URL to cite or link to this record in EThOS:
Title: A statistical approach to spoken language understanding
Author: He, Y.
Awarding Body: University of Cambridge
Current Institution: University of Cambridge
Date of Award: 2004
Availability of Full Text:
Full text unavailable from EThOS.
Please contact the current institution’s library for further details.
The research work described here focuses on statistical learning approaches for building a purely data-driven spoken language understanding (SLU) system whose three major components, the speech recognizer, the semantic parser, and the dialogue act decoder are trained entirely from data. The system is comparable to existing SLU systems which rely on either hand-crafted semantic grammar rules or statistical model trained on fully-annotated training corpora but it has greatly reduced build cost. The core of the system is a novel hierarchical semantic parser model called a Hidden Vector State (HVS) model. Unlike other hierarchical parsing models which require fully-annotated treebank data for training, the HVS model can be trained using only lightly annotated data whilst simultaneously retaining sufficient ability to capture the hierarchical structure needed to robustly extract task domain semantics. The HVS parser is combined with a dialogue act detector based on Naive Bayesian networks which have been extended and refined by introducing Tree-Augmented Naive Bayes networks (TANs) to allow inter-concept dependencies to be robustly modelled. Finally, the two semantic analyzer components, the HVS semantic parser and the modified-TAN dialogue act decoder, have been integrated with a standard HTK-based Hidden Markov Model (HMM) speech recognizer and the additional knowledge provided by the semantic analyzer has been used to determine the best-scoring word hypothesis from the N-best lists generated by the speech recognizer. This purely data-driven spoken language understanding (SLU) system has been built and tested using both the ATIS and DARPA Communicator test sets. In addition to testing on clean data, the systems has been tested on various levels of noisy data and on modified application domains. The results support the claim that an SLU system which is statistically-based and trained entirely from data is intrinsically robust and can be readily adapted to new applications.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available