Use this URL to cite or link to this record in EThOS:
Title: Machine learning ensemble method for discovering knowledge from big data
Author: Farrash, Majed
ISNI:       0000 0004 5916 0287
Awarding Body: University of East Anglia
Current Institution: University of East Anglia
Date of Award: 2016
Availability of Full Text:
Access from EThOS:
Access from Institution:
Big data, generated from various business internet and social media activities, has become a big challenge to researchers in the field of machine learning and data mining to develop new methods and techniques for analysing big data effectively and efficiently. Ensemble methods represent an attractive approach in dealing with the problem of mining large datasets because of their accuracy and ability of utilizing the divide-and-conquer mechanism in parallel computing environments. This research proposes a machine learning ensemble framework and implements it in a high performance computing environment. This research begins by identifying and categorising the effects of partitioned data subset size on ensemble accuracy when dealing with very large training datasets. Then an algorithm is developed to ascertain the patterns of the relationship between ensemble accuracy and the size of partitioned data subsets. The research concludes with the development of a selective modelling algorithm, which is an efficient alternative to static model selection methods for big datasets. The results show that maximising the size of partitioned data subsets does not necessarily improve the performance of an ensemble of classifiers that deal with large datasets. Identifying the patterns exhibited by the relationship between ensemble accuracy and partitioned data subset size facilitates the determination of the best subset size for partitioning huge training datasets. Finally, traditional model selection is inefficient in cases wherein large datasets are involved.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available