Use this URL to cite or link to this record in EThOS:
Title: A machine learning clustering technique for autism screening and other applications
Author: Baadel, Said
ISNI:       0000 0004 8504 145X
Awarding Body: University of Huddersfield
Current Institution: University of Huddersfield
Date of Award: 2019
Availability of Full Text:
Access from EThOS:
Access from Institution:
Clustering is one of the challenging machine learning techniques due to its unsupervised learning nature. While many clustering algorithms constrain objects to single clusters, K-means overlapping partitioning clustering based methods assign objects to multiple clusters by relaxing the constraints and allowing objects to belong to more than one cluster to better fit hidden structures in the data. However, when datasets contain outliers, they can significantly influence the mean distance of the data objects to their respective clusters, which is a drawback. Therefore, most researchers address this problem by simply removing the outliers. This can be problematic especially in applications such as autism screening, fraud detection, and cybersecurity attacks among others. In this thesis, an alternative solution to this problem is proposed that captures outliers and stores them on the fly within a new cluster, instead of discarding. The new algorithm is named Outlier-based Multi-Cluster Overlapping K-Means Extension (OMCOKE). The algorithm addresses an issue previously ignored by other work in overlapping clustering and therefore benefits various stakeholders as these outliers could have real-life applications. The proposed solution has been evaluated on a crucial behavioural science problem called screening of autistic traits to improve the performance of detecting autism spectrum disorder (ASD) traits and reduce features redundancy. OMCOKE was integrated as a learning algorithm with a semi-supervised ML framework approach called Clustering based Autistic Trait Classification (CATC) in Chapter 5. Based on the experimental results obtained on real datasets related to autism screening OMCOKE was able to identify potential autism cases based on their similarity traits as opposed to conventional scoring functions used by ASD screening tools. Moreover, the empirical results obtained by OMCOKE on different datasets involving children, adolescents, and adults were compared to other results produced by common ML techniques. The results showed that our semi-supervised framework offers models with higher predictive accuracy, sensitivity, and specificity rates than those of other intelligent classification approaches such as Artificial Neural Network (ANN), Random Forest, and Random Trees, and Rule Induction. These models are useful since they are exploited by diagnosticians and other stakeholders involved in ASD screening besides highlighting the most influential features. The chapters in this thesis have been disseminated or are under review in various reputable journals and in refereed conference proceedings.
Supervisor: Lu, Joan ; Thabtah, Fadi Abdeljaber ; Xu, Qiang Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: H Social Sciences (General) ; HA Statistics ; HV Social pathology. Social and public welfare ; L Education (General) ; Q Science (General) ; QA75 Electronic computers. Computer science ; QA76 Computer software