Use this URL to cite or link to this record in EThOS:
Title: Discovering patterns and anomalies in graphs with discrete and numeric attributes
Author: Davis, Michael
ISNI:       0000 0004 5369 654X
Awarding Body: Queen's University Belfast
Current Institution: Queen's University Belfast
Date of Award: 2014
Availability of Full Text:
Full text unavailable from EThOS.
Please contact the current institution’s library for further details.
In this thesis, we investigate pattern mining and anomaly detection in datasets with both structural and numeric attributes. Graphs are used to represent complex structures such as social networks, infrastructure networks, information networks and chemical compounds. Many graph datasets are annotated with numeric labels or weights. We show that numeric attributes are closely related to graph structure, and exploit this observation for substructure discovery and anomaly detection. Our first contribution is Agwan (Attribute Graphs: Weighted and Numeric), a generative model for random graphs with discrete labels and weighted edges. We present algorithms for parameter fitting and graph generation. Using real-world directed and undirected graphs as input, we compare our approach to state-of-the-art random graph generators and draw conclusions about the contribution of vertex labels and edge weights to graph structure. Our second contribution is a constraint on substructure discovery based on the 'outlierness" of graph numeric attributes, which is used to improve search performance and discrimination. In our experiments, we implement our method as a pre-processing step to prune anomalous vertices and edges prior to graph mining, allowing us to evaluate it on graph databases and Single Large Graphs. We measure the effect on runtime, memory requirements and coverage of discovered patterns, relative to the unconstrained approaches. We also outline how our method can be extended for high-dimensional numeric features. Finally, we present Yagada (Yet Another Graph-based Anomaly Detection Algorithm), an algorithm to search for anomalies in graphs with numeric attributes. Yagada is explained using several security-related examples and validated with experiments on a physical Access Control database. Quantitative analysis shows that in the upper range of anomaly thresholds, Yagada detects twice as many anomalies as the best-performing numeric discretization algorithm. Qualitative evaluation shows that the detected anomalies are meaningful, representing a combination of structural irregularities and numerical outliers.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available