Analysis of survival data in breast cancer
There are three objectives in the thesis: description of the natural duration of treated breast cancer, prognostic staging for breast cancer, and study of curability from the disease. The thesis begins with a review of the relevant literature. There follows a general description of the data. In particular, the pattern of referral and distribution of patients by covariates are given. Analysis of multi-way contingency tables leads to general inferences about how breast cancer presents. Estimated hazard functions and life-tables are presented for different prognostic factors, and covariates which are associated with poor prognosis are identified. Hazard functions are important because they display more vividly than life-tables aspects which are relevant for the statistical modelling of survival. International stage separates patients into 4 groups with progressively poorer survival. Hazard is monotone decreasing for patients with the most advanced disease, stage 4; more than 60% of such patients die within one year of diagnosis. For patients with International stage 1, 2 or 3 breast cancer a different pattern emerges. Estimated hazard increases during the first 1 to 4 years after diagnosis and then slowly declines. Peak hazard occurs earlier in stage 3 and stage 2 disease, but by the tenth year after diagnosis the hazard functions converge for survivors from all 3 groups. The failure of standard regression models---Weibull, proportional hazards, log-logistic---to represent the survival of patients with breast cancer is then demonstrated. Of course, study of the estimated hazard functions, as previously described, anticipated failure for two of the 3 models. In particular, the Weibull family with common shape parameter inappropriately defined for each prognostic group a monotone hazard function. The proportional hazards model can accommodate neither the convergence of hazard functions nor diversity of times to peak hazard. Although the log-logistic family allows for both of these, there is evidence of lack of fit. An interesting feature of these analyses is agreement on the relative importance of clinical covariates. This feature is shown also when a linear-logistic model is fitted to the probability of death within 10 years of diagnosis. Proportionality of hazards must not be taken for granted especially if there are long-term survivors, as in breast cancer. The pattern of hazard for other diseases in which long-term survival is relatively frequent may also be characterised by convergence of hazard functions and variable time to peak hazard. One interpretation of the convergence of hazard for different prognostic groups is that the influence of covariates diminishes in time. Application of the proportional hazards model to discrete time intervals is introduced as a descriptive technique by which to investigate this phenomenon. A different constant of proportionality is estimated for each interval. This leads to a modified Cox model in which the influence of covariates is allowed to change smoothly with time according to a prescribed function. The descriptive method is seen as a valuable stepping-stone because it provides initial estimates for parameters in the final model. Also developed is a prognostic staging system for breast cancer based on the foregoing models. This system has a simple description in terms of elementary scores, and is compared with the International stage classification which, was devised empirically, but not statistically. Awareness of the natural history of breast cancer has had two major effects on the presentation of survival data. Firstly, it has led to recognition that conventional fixed-time survival at 5 or 10 years is not synonymous with cure, and secondly, the long latent interval in breast cancer makes it necessary to allow for the normal mortality experience in the general population. The final topic is therefore age-corrected survival. The methods developed have general application to the analysis, of long-term survival. Allowance is made for expected mortality from causes other than breast cancer by reference to the age-specific death rates in the Scottish national population. An approximation to the current complete life-table for Scotland is derived by modelling the hazard function as exponential in 5 age-intervals. This life-table approximation is used to develop alternative statistical models for age-corrected survival which permit analysis by several covariates. The two models for the age-corrected life-table of 5 year survivors are inspired by noticing that the excess death rate and the ratio of observed to expected deaths decline exponentially from 5 years after diagnosis. They are additive and relative hazard models in which the influence of prognostic factors decays exponentially with time. For breast cancer, the additive hazard model gives a simpler description, suggesting that only clinical covariates, not age, are relevant for curability. By 20 years after diagnosis, the confidence intervals for relative hazard overlap considerably in different prognostic groups.