Use this URL to cite or link to this record in EThOS:
Title: Data mining temporal and indefinite relations with numerical dependencies
Author: Collopy, Ethan Richard
ISNI:       0000 0001 3560 8743
Awarding Body: University of London
Current Institution: University College London (University of London)
Date of Award: 1999
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
We propose that data mining, the search for useful, non-trivial and previously unknown information within a database, can be successfully performed with Numerical Dependencies (NDs), a generalisation of Functional Dependencies (FDs), to model the data, together with resampling, a computationally intensive statistical sampling process, which allows us to make inferences from temporal and indefinite databases. We use NDs to model relations containing temporal and indefinite information. We extend the theory of NDs by presenting measures for data mining and generalise the chase procedure, a method for updating a relation to satisfy a constraint set, for NDs. We motivate NDs in real-world applications by introducing a database design tool. The consistency problem, that of attempting to find a relation satisfying a set of FDs within an indefinite relation, known to be NP-complete, is studied in the context of using NDs for approximation. We employ resampling, based on taking samples of definite relations from indefinite ones, on incremental sample sizes until an approximate fixpoint is reached, denoting an upper bound on the required sample size. Extensive simulations highlight that resampling to find upper bounds in conjunction with the chase for indefinite relations returns valid approximate solutions. We also study NDs in temporal sequences of relations for knowledge discovery purposes. Each relation within a sequence is mined for a set of NDs which evolve with updates in data. We introduce a temporal logic for the discovery of rules and properties within these sequences, or subsequences, which includes statistical functions within the temporal operators for time series analysis. We also show that time series data may be analysed using a restricted set of the logic. We apply discovery algorithms to both sequences and resampled sequences, allowing smoothing for trend detection. Investigations, presented herein, show these rules to provide interesting and practicable results.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: Information; Databases; Statistical sampling