Use this URL to cite or link to this record in EThOS:
Title: Learning from semantically heterogeneous aggregate data in a distributed environment
Author: Zhang, Shuai
ISNI:       0000 0004 2720 2745
Awarding Body: University of Ulster
Current Institution: Ulster University
Date of Award: 2009
Availability of Full Text:
Access from EThOS:
Information cooperation, reuse and integration can be developed on the platform of rapidly growing open distributed environments and can support development of Ambient Intelligence. However, in such environments, information may be only partially observed due to the unreliability of data collection technologies and heterogeneity in the ontologies employed caused by distributed and independent system development. These challenges need to be overcome to facilitate intelligent data analysis. We focus on the use of large-scale databases such as statistical databases and data warehouses, where aggregates can be obtained to summarise information; such aggregates are valuable in providing efficient access, computation and communication. A principle-based learning framework is proposed and developed for semantically heterogeneous aggregate data using maximum likelihood techniques via the EM (Expectation-Maximisation) algorithm. The learning framework inherently handles data incompleteness and schema heterogeneity from unreliable, incomplete or uncertain information sources. The framework is developed for supervised and unsupervised learning from data in a distributed environment. This development is demonstrated using two scenarios. In the first scenario a decision-making mechanism is proposed to support assistive living for elderly people in a smart home environment. The mechanism incorporates modules for learning inhabitants' activities of daily living based on partially observed and unlabelled data, enabling hierarchical activity prediction and assisting inhabitants in completing activities by providing personalised reminders. Real data have been collected in a smart kitchen laboratory, and realistic synthetic data are also used for evaluation. Results show consistent and robust performance and other information and insights are also obtained. In the second scenano a model-based clustering algorithm is proposed for independently developed distributed heterogeneous databases to support cooperation between organisations, including distributed smart homes from different institutions. Clustering in the presence of data heterogeneity enables the characteristics of similar contexts to be captured. The algorithm is systematically evaluated using simulated data, with encouraging results and good scalability to large numbers of databases.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available