Use this URL to cite or link to this record in EThOS:
Title: Anomaly-based network intrusion detection enhancement by prediction threshold adaptation of binary classification models
Author: Al Tobi, Amjad Mohamed
ISNI:       0000 0004 7656 833X
Awarding Body: University of St Andrews
Current Institution: University of St Andrews
Date of Award: 2018
Availability of Full Text:
Access from EThOS:
Access from Institution:
Network traffic exhibits a high level of variability over short periods of time. This variability impacts negatively on the performance (accuracy) of anomaly-based network Intrusion Detection Systems (IDS) that are built using predictive models in a batch-learning setup. This thesis investigates how adapting the discriminating threshold of model predictions, specifically to the evaluated traffic, improves the detection rates of these Intrusion Detection models. Specifically, this thesis studied the adaptability features of three well known Machine Learning algorithms: C5.0, Random Forest, and Support Vector Machine. The ability of these algorithms to adapt their prediction thresholds was assessed and analysed under different scenarios that simulated real world settings using the prospective sampling approach. A new dataset (STA2018) was generated for this thesis and used for the analysis. This thesis has demonstrated empirically the importance of threshold adaptation in improving the accuracy of detection models when training and evaluation (test) traffic have different statistical properties. Further investigation was undertaken to analyse the effects of feature selection and data balancing processes on a model's accuracy when evaluation traffic with different significant features were used. The effects of threshold adaptation on reducing the accuracy degradation of these models was statistically analysed. The results showed that, of the three compared algorithms, Random Forest was the most adaptable and had the highest detection rates. This thesis then extended the analysis to apply threshold adaptation on sampled traffic subsets, by using different sample sizes, sampling strategies and label error rates. This investigation showed the robustness of the Random Forest algorithm in identifying the best threshold. The Random Forest algorithm only needed a sample that was 0.05% of the original evaluation traffic to identify a discriminating threshold with an overall accuracy rate of nearly 90% of the optimal threshold.
Supervisor: Duncan, Ishbel Mary Macdonald Sponsor: Ministry of Higher Education, Oman ; Jāmiʻat al-Sulṭān Qābūs
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: Intrusion detection system ; Anomaly-based IDS ; Threshold adaptation ; Prediction accuracy improvement ; Machine learning ; STA2018 dataset ; C5.0 algorithm ; Random forest algorithm ; Support vector machine algorithm ; TK5105.59A6 ; Intrusion detection systems (Computer security)