Categorising variables in medical contexts
Many medical studies involve modelling the relationship between an outcome variable and a series of one or more continuous/interval scaled discrete explanatory variables. It is common practice in many of these studies for some, or indeed all, of the continuous/interval scaled discrete explanatory factors to be incorporated into the analysisi n a categorisedo r groupedf orm. One of the main reasons for adopting this methodology is that it will simplify the interpretation of results for clinicians and hopefully patients. It is often easier to interpret conclusions based on an explanatory variable with two or three levels (i. e. categorisations) than from a continuous/interval scaled discrete explanatory. The main drawback with this technique is in identifying the categorisation points. Often preconceived and/or historical grounds are the determining factor used to decide the location of these categorisation points. However, this may not give rise to sensible or justifiable locations for such points for a given application. This thesis will consider the analysis of data from various types of medical study and, by applying non-parametric statistical methodology, provide alternative, more logical rationale for identifying categorisation points. The analysis will concentrate on data from three specific types of medical study -a cohort study with a binary outcome, a matched case/control study and survival analysis. In a cohort study with a binary response the standard methodology of logistic regression will be applied and extended using a non-parametric logistic approach to identify potential categorisation points. As a further extension consideration will be given to the more formal methodology of examining the first derivative of the ii resultant non-parametric logistic regression to provide the location of categorisation points. In matched caselcontrol studies the standard technique used for analysis is conditional logistic regression. The theory and application of this model will be discussed before considering two new, alternative, non-parametric approaches to analysing matched case/control studies with an interval scaled discrete explanatory variable. The proposedn on-parametrica pproachesw ill be testedt o investigatet heir usefulness in identification of categorisations for the explanatory variable. Possible extensionst o thesea pproachesto incorporatea single continuouse xplanatoryv ariable will be discussed. In order to compare the two non-parametric approaches a simulation study will be carried out to investigate the power of these approaches. Finally, consideration will be given to the analysis of survival data. Initially, the standard methodologies of the Kaplan and Meier estimator in the absence of explanatory variables and Cox's Proportional Hazards model to incorporate explanatory variables will be discussed. A more detailed examination of three alternative methods for analysing survival data in the presence of a single continuous explanatory will be carried out. Each of the methods will be applied in turn to a survival analysis problem to investigate if any categorisationsc an be identified for a single continuous explanatory variable. Further simulations will be undertaken to compare the three methods across a variety of scenarios.