Use this URL to cite or link to this record in EThOS: https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.815254
Title: Addressing class imbalance for logistic regression
Author: Li, Yazhe
ISNI:       0000 0004 9357 1944
Awarding Body: Imperial College London
Current Institution: Imperial College London
Date of Award: 2020
Availability of Full Text:
Access from EThOS:
Access from Institution:
Abstract:
The challenge of class imbalance arises in classification problem when the minority class is observed much less than the majority class. This characteristic is endemic in many domains. Work by [Owen, 2007] has shown that, in a theoretical context related to infinite imbalance, logistic regression behaves such that all data in the rare class can be replaced by their mean vector to achieve the same coefficient estimates. Such results suggest that cluster structure among the minority class may be a specific problem in highly imbalanced logistic regression. In this thesis, we focus on highly imbalanced logistic regression and develop mitigation methods and diagnostic tools. Theoretically, we extend the [Owen, 2007] results to show the phenomenon remains true for both weighted and penalized likelihood methods in the infinitely imbalanced regime, which suggests these alternative choices to logistic regression are not enough for highly imbalanced logistic regression. For mitigation methods, we propose a novel relabeling solution based on relabeling the minority class to handle imbalance problem when using logistic regression, which essentially assigns new labels to the minority class observations. Two algorithms (the Genetic algorithm and the Expectation Maximization algorithm) are formalized to serve as tools for computing this relabeling. In simulation and real data experiments, we show that logistic regression is not able to provide the best out-of-sample predictive performance, and our relabeling approach that can capture underlying structure in the minority class is often superior. For diagnostic tools to detect highly imbalanced logistic regression, different hypothesis testing methods, along with a graphical tool are proposed, based on the mathematical insights about highly imbalanced logistic regression. Simulation studies provide evidence that combining our diagnostic tools with mitigation methods as a systematic strategy has the potential to alleviate the class imbalance problem among logistic regression.
Supervisor: Adams, Niall ; Bellotti, Anthony Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.815254  DOI:
Share: