Use this URL to cite or link to this record in EThOS:
Title: Investigating machine learning methods in recommender systems
Author: Michailidis, Marios
ISNI:       0000 0004 7227 7359
Awarding Body: UCL (University College London)
Current Institution: University College London (University of London)
Date of Award: 2017
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
This thesis investigates the use of machine learning in improving predictions of the top K* product purchases at a particular a retailer. The data used for this research is a freely-available (for research) sample of the retailer’s transactional data spanning a period of 102 weeks and consisting of several million observations. The thesis consists of four key experiments: 1. Univariate Analysis of the Dataset: The first experiment, which is the univariate analysis of the dataset, sets the background to the following chapters. It provides explanatory insight into the customers’ shopping behaviour and identifies the drivers that connect customers and products. Using various behavioural, descriptive and aggregated features, the training dataset for a group of customers is created to map their future purchasing actions for one specific week. The test dataset is then constructed to predict the purchasing actions for the forthcoming week. This constitutes a univariate analysis and the chapter is an introduction to the features included in the subsequent algorithmic processes. 2. Meta-modelling to predict top K products: The second experiment investigates the improvement in predicting the top K products in terms of precision at K (or precision@K) and Area Under Curve (AUC) through meta-modelling. It compares combining a range of common machine learning algorithms of a supervised nature within a meta-modelling framework (where each generated model will be an input to a secondary model) with any single model involved, field benchmark or simple model combination method. 3. Hybrid method to predict repeated, promotion-driven product purchases in an irregular testing environment: The third experiment demonstrates a hybrid methodology of cross validation, modelling and optimization for improving the accuracy of predicting the products the customers of a retailer will buy after havingbought them at least once with a promotional coupon. This methodology is applied in the context of a train and test environment with limited overlap - the test data includes different coupons, different customers and different time periods. Additionally this chapter uses a real life application and a stress-test of the findings in the feature engineering space from experiment 1. It also borrows ideas from ensemble (or meta) modelling as detailed in experiment 2. 4. The StackNet model: The fourth experiment proposes a framework in the form of a scalable version of [Wolpert 1992] stacked generalization being extended through cross validation methods to many levels resembling in structure a fully connected feedforward neural network where the hidden nodes represent complex functions in the form of machine learning models of any nature. The implementation of the model is made available in the Java programming language. The research contribution of this thesis is to improve the recommendation science used in the grocery and Fast Moving Consumer Goods (FMCG) markets. It seeks to identify methods of increasing the accuracy of predicting what customers are going to buy in the future by leveraging up-to-date innovations in machine learning as well as improving current processes in the areas of feature engineering, data pre-processing and ensemble modelling. For the general scientific community this thesis can be exploited to better understand the type of data available in the grocery market and to gain insights into how to structure similar machine learning and analytical projects. The extensive, computational and algorithmic framework that accompanies this thesis is also available for general use as a prototype to solve similar data challenges. References: Wolpert, D. H. (1992). Stacked generalization. Neural networks, 5(2), 241-259. Yang, X., Steck, H., Guo, Y., & Liu, Y. (2012). On top-k recommendation using social networks. In Proceedings of the sixth ACM conference on Recommender systems (pp. 67-74). ACM.
Supervisor: Treleaven, P. ; Giles, P. Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available