Title:

Discrete Weibull regression model for count data

Data can be collected in the form of counts in many situations. In other words, the number of deaths from an accident, the number of days until a machine stops working or the number of annual visitors to a city may all be considered as interesting variables for study. This study is motivated by two facts; first, the vital role of the continuous Weibull distribution in survival analyses and failure time studies. Hence, the discrete Weibull (DW) is introduced analogously to the continuous Weibull distribution, (see, Nakagawa and Osaki (1975) and Kulasekera (1994)). Second, researchers usually focus on modeling count data, which take only nonnegative integer values as a function of other variables. Therefore, the DW, introduced by Nakagawa and Osaki (1975), is considered to investigate the relationship between count data and a set of covariates. Particularly, this DW is generalised by allowing one of its parameters to be a function of covariates. Although the Poisson regression can be considered as the most common model for count data, it is constrained by its equidispersion (the assumption of equal mean and variance). Thus, the negative binomial (NB) regression has become the most widely used method for count data regression. However, even though the NB can be suitable for the overdispersion cases, it cannot be considered as the best choice for modeling the underdispersed data. Hence, it is required to have some models that deal with the problem of underdispersion, such as the generalized Poisson regression model (Efron (1986) and Famoye (1993)) and COMPoisson regression (Sellers and Shmueli (2010) and SáezCastillo and CondeSánchez (2013)). Generally, all of these models can be considered as modifications and developments of Poisson models. However, this thesis develops a model based on a simple distribution with no modification. Thus, if the data are not following the dispersion system of Poisson or NB, the true structure generating this data should be detected. Applying a model that has the ability to handle different dispersions would be of great interest. Thus, in this study, the DW regression model is introduced. Besides the exibility of the DW to model under and overdispersion, it is a good model for inhomogeneous and highly skewed data, such as those with excessive zero counts, which are more disperse than Poisson. Although these data can be fitted well using some developed models, namely, the zeroinated and hurdle models, the DW demonstrates a good fit and has less complexity than these modifed models. However, there could be some cases when a special model that separates the probability of zeros from that of the other positive counts must be applied. Then, to cope with the problem of too many observed zeros, two modifications of the DW regression are developed, namely, zeroinated discrete Weibull (ZIDW) and hurdle discrete Weibull (HDW) models. Furthermore, this thesis considers another type of data, where the response count variable is censored from the right, which is observed in many experiments. Applying the standard models for these types of data without considering the censoring may yield misleading results. Thus, the censored discrete Weibull (CDW) model is employed for this case. On the other hand, this thesis introduces the median discrete Weibull (MDW) regression model for investigating the effect of covariates on the count response through the median which are more appropriate for the skewed nature of count data. In other words, the likelihood of the DW model is reparameterized to explain the effect of the predictors directly on the median. Thus, in comparison with the generalized linear models (GLMs), MDW and GLMs both investigate the relations to a set of covariates via certain location measurements; however, GLMs consider the means, which is not the best way to represent skewed data. These DW regression models are investigated through simulation studies to illustrate their performance. In addition, they are applied to some real data sets and compared with the related count models, mainly Poisson and NB models. Overall, the DW models provide a good fit to the count data as an alternative to the NB models in the overdispersion case and are much better fitting than the Poisson models. Additionally, contrary to the NB model, the DW can be applied for the underdispersion case.
