Use this URL to cite or link to this record in EThOS:
Title: Framework for web log pre-processing within web usage mining
Author: Khairo-Sindi, Mazin Omar
ISNI:       0000 0001 3597 9800
Awarding Body: University of Manchester : UMIST
Current Institution: University of Manchester
Date of Award: 2004
Availability of Full Text:
Access from EThOS:
Web mining is gaining popularity by the day and the role of the web in providing invaluable information about users' behaviour and navigational patterns is now highly appreciated by information technology specialists and businesses alike. Nevertheless, given the enormity of the web and the complexities involved in delivering and retrieving electronic information, one can imagine the difficulties involved in extracting a set of minable objects from the raw and huge web log data. Added to the fact that web mining is a new science, this may explain why research on data pre-processing is still limited in scope. And, although the debate on major issues is still gaining momentum, attempts to establish a coherent and accurate web usage pre-processing framework are still non existent. As a contribution to the existing debate, this research aims at formulating a workable, reliable, and coherent pre-processing framework. The present study will address the following issues: enhance and maximise knowledge about every visit made to a given website from multiple web logs even when they have different schemas, improve the process of eliminating excessive web log data that are not related to users' behaviour, modify the existing approaches for session identification in order to obtain more accurate results and eliminate redundant data that comes as a result of repeatedly adding cached data to the web logs regardless whether or not the added page is a frameset. In addition to the suggested improvements, the study will also introduce a novel task, namely, "automatic web log integration". This will make it possible to integrate different web logs with different schemas into a unified data set. Finally, the study will incorporate unnecessary information, particularly that pertaining to malicious website visits into the non user request removal task. Put together, both the suggested improvements and novel tasks result into a coherent pre-processing framework. To test the reliability and validity of the framework, a website is created in order to perform the necessary experimental work and a prototype pre-processing tool is devised and employed to support it.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available