Use this URL to cite or link to this record in EThOS:
Title: A generic data representation for predicting player behaviours
Author: Xie, Hanting
ISNI:       0000 0004 7227 0122
Awarding Body: University of York
Current Institution: University of York
Date of Award: 2017
Availability of Full Text:
Access from EThOS:
Access from Institution:
A common use of predictive models in game analytics is to predict the behaviours of players so that pre-emptive measures can be taken before they make undesired decisions. A standard data pre-processing step in predictive modelling includes both data representation and category definition. Data representation extracts features from the raw dataset to represent the whole dataset. Much research has been done towards predicting important player behaviours with game-specific data representations. Some of the resulting efforts have achieved competitive performance; however, due to the game-specific data representations they apply, game companies need to spend extra efforts to reuse the proposed methods in more than one products. This work proposes an event-frequency-based data representation that is generally applicable to games. This method of data representation relies only on counts of in-game events instead of prior knowledge of the game. To verify the generality and performance of this data-representation, it was applied to three different genres of games for predicting player first-purchasing, disengagement and churn behaviours. Experiments show that this data representation method can provide a competitive performance across different games. Category definition is another essential component of classification problems. As labelling method that relies on some specific conditions to distribute players into classes can often lead to imbalanced classification problems, this work applied two commonly used approaches, i.e., random undersampling and Synthetic Minority Over-Sampling Technique (SMOTE), for rebalancing the imbalanced tasks. Results suggested that undersampling is able to provide better performance in the cases where the quantity of data is sufficient whereas the SMOTE has more chances when the dataset is too small to be balanced with the undersampling approach. Besides, this work also proposes a new category-definition method which can maintain a distribution of the resultant classes that is closer to balanced. In addition, the parameters used in this method can also be used to gain insight into the health of the game. Preliminary experimental results show that this method of category definition is able to improve the balance of the class distribution when it is applied to different games and provide significantly better performance than random classifiers.
Supervisor: Kudenko, Daniel ; Devlin, Sam Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available