Generalized linear models for large dependent data sets
Generalized linear models (GLMs) were originally used to build regression models for independent responses. In recent years, however, effort has focused on extending the original GLM theory to enable it to be applied to data which exhibit dependence in the responses. This thesis focuses on some specific extensions of the GLM theory for dependent responses. A new hypothesis testing technique is proposed for the application of GLMs to cluster dependent data. The test is based on an adjustment to the 'independence' likelihood ratio test, which allows for the within cluster dependence. The performance of the new test, in comparison to established techniques, is explored. The application of the generalized estimating equations (GEE) methodology to model space-time data is also investigated. The approach allows for the temporal dependence via the covariates and models the spatial dependence using techniques from geostatistics. The application area of climatology has been used to motivate much of the work undertaken. A key attribute of climate data sets, in addition to exhibiting dependence both spatially and temporally, is that they are typically large in size, often running into millions of observations. Therefore, throughout the thesis, particular attention has focused on computational issues, to enable analysis to be undertaken in a feasible time frame. For example, we investigate the use of the GEE one-step estimator in situations where the application of the full algorithm is impractical. The final chapter of this thesis presents a climate case study. This involves wind speeds over northwestern Europe, which we analyse using the techniques developed.