Title:

Contributions to the Bayesian analysis of mixture models

Mixture models can be used to approximate irregular densities or to model heterogeneity.
·When a density estimate is needed, then we can approximate any distribution on
the real line using an infinite number of normals (Ferguson (1983)). On the other hand,
when a mLxture model is used to model heterogeneity, there is a proper interpretation for
each element of the modeL If the distributional assumptions about the components are
met and the number of underlying clusters within the data is known, then in a Bayesian
setting, to perform classification analysis and in general component specific inference,
methods to undo the label switching and recover the interpretation of the components
need to be applied. If latent allocations for the design of the Markov chain Monte Carlo
(MCMC) strategy are included, and the sampler has converged, then labels assigned
to each component may change from iteration to iteration. However, observations being allocated together must remain similar, and we use this fundamental fact to derive
an easy and efficient solution to the label switching problem. We compare our strategy
with other relabeling algorithms on univariate and multivariate data examples and
demonstrate improvements over alternative strategies.
When there is no further information about the shape of components and the number
of clusters within the data, then a common theme is the use of the normal distribution
as the "benchmark" components distribution. However, if a cluster is skewed or heavy
tailed, then the normal distribution will be inefficient and many may be needed to model
a single cluster. In this thesis, we present an attempt to solve this problem. We define
a cluster to be a group of data which can be modeled by a unimodal density function.
Hence, our intention is to use a family of univariate distribution funct ions, to replace the
normal, for which the only constraint is unimodality. With this aim, we devise a new
family of nonparametric unimodal distributions, which has large support over the space
of univariate unimoda1 distributions. The difficult aspect of the Bayesian model is to
construct a suitable MCMC algorithm to sample from the correct posterior distribution.
The key will be the introduction of strategic latent variables and the use of the product
space (Godsill (2001») view of reversible jump (Green (1995») methodology. We illustrate
and compare our methodology against the classic mixture of normals using simulated and
real data sets. To solve the label switching problem we use the new relabeling algorithm.
