Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.656076
Title: Mining time-series data using discriminative subsequences
Author: Hills, Jonathan F. F.
Awarding Body: University of East Anglia
Current Institution: University of East Anglia
Date of Award: 2014
Availability of Full Text:
Access from EThOS:
Access from Institution:
Abstract:
Time-series data is abundant, and must be analysed to extract usable knowledge. Local-shape-based methods offer improved performance for many problems, and a comprehensible method of understanding both data and models. For time-series classification, we transform the data into a local-shape space using a shapelet transform. A shapelet is a time-series subsequence that is discriminative of the class of the original series. We use a heterogeneous ensemble classifier on the transformed data. The accuracy of our method is significantly better than the time-series classification benchmark (1-nearest-neighbour with dynamic time-warping distance), and significantly better than the previous best shapelet-based classifiers. We use two methods to increase interpretability: First, we cluster the shapelets using a novel, parameterless clustering method based on Minimum Description Length, reducing dimensionality and removing duplicate shapelets. Second, we transform the shapelet data into binary data reflecting the presence or absence of particular shapelets, a representation that is straightforward to interpret and understand. We supplement the ensemble classifier with partial classifocation. We generate rule sets on the binary-shapelet data, improving performance on certain classes, and revealing the relationship between the shapelets and the class label. To aid interpretability, we use a novel algorithm, BruteSuppression, that can substantially reduce the size of a rule set without negatively affecting performance, leading to a more compact, comprehensible model. Finally, we propose three novel algorithms for unsupervised mining of approximately repeated patterns in time-series data, testing their performance in terms of speed and accuracy on synthetic data, and on a real-world electricity-consumption device-disambiguation problem. We show that individual devices can be found automatically and in an unsupervised manner using a local-shape-based approach.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.656076  DOI: Not available
Share: