Use this URL to cite or link to this record in EThOS:
Title: Finite mixture modeling with non-local priors
Author: Fúquene Patiño, Jairo A.
ISNI:       0000 0004 7657 8220
Awarding Body: University of Warwick
Current Institution: University of Warwick
Date of Award: 2018
Availability of Full Text:
Access from EThOS:
Access from Institution:
Choosing the number of mixture components remains a central but elusive challenge. Traditional model selection criteria can be either overly liberal or conservative when enforcing parsimony. They may also result in poorly separated components of limited practical use. In this thesis, the term parsimony refers to selecting a simpler model by enforcing a separation between the models under consideration, and the term sparsity refers to the ability of penalizing overfitted models leading to well-separated components with non-negligible weight, interpretable as distinct subpopulations. Non-local priors (NLPs) are a family of distributions that encourage parsimony by enforcing a separation between the models under consideration. In this thesis we investigate the use of NLPs to choose the number of components in mixture models. Our main contributions are proposing the use of non-local priors (NLPs) to select the number of components, characterizing the properties of the associated inference (in particular, improved sparsity) and proposing tractable expressions suitable for prior elicitation purposes, simpler and computationally efficient algorithms and practical applications. Chapter 2 develops the theoretical framework. We present NLPs in the context of mixtures and show how they lead to well-separated components that have non-negligible weight, hence interpretable as distinct subpopulations. Moreover we formulate a general NLP class, propose a particular choice leading to tractable expressions and give a theoretical characterization of the sparsity induced by NLPs for choosing the number of mixture components. Although the framework is generic we fully develop multivariate Normal, Binomial and product Binomial mixtures based on a family of exchangeable moment priors. Chapter 3 presents the prior computation and elicitation. We suggest default prior settings based on detecting multi-modal Normal and T mixtures, and minimal informativeness for categorical outcomes where multi-modality is not a natural consideration. The theory and underlying principles in this thesis hold more generally as outlined in Chapter 2, however. Chapter 4 presents the computational framework for model selection and fitting. We propose simple algorithms based on Markov chain Monte Carlo methods and Expectation Maximization algorithms to obtain the integrated likelihood and parameter estimates. Chapters 5-7 contain the simulation studies and applications. In Chapter 5 we compare the performance of our proposal to its local prior counterpart as well as the Bayesian Information Criterion (BIC), the singular Bayesian Information Criterion (sBIC) and the Akaike Information Criterion (AIC). Our results show a serious lack of sensitivity of the Bayesian information criterion (BIC) and insufficient parsimony of the AIC and the local prior counterpart to our formulation. The singular BIC behaved like the BIC in some examples and the AIC in others. In Chapter 6 we explore a computational fast non-local model selection cri- teria and propose a new computational strategy that provides a direct connection between cluster occupancies and Bayes factors with the advantage that Bayes factors allow for more general model comparisons (for instance equal vs unequal covariances in Normal mixtures). This new computational strategy is helpful to discard unoccupied clusters in overfitted mixtures and we remark that the result has interest beyond purely computational purposes, e.g. to set thresholds on empty cluster probabilities in overfitted mixtures. In Chapter 7 we present the applications of this thesis and also offer comparisons to overfitted and repulsive overfitted mixtures. In most examples their performance was competitive but depended on setting the prior parameters adequately to prevent the appearance of spurious components. The number of components inferred under NLPs was closer to the true number (when this was known) and remained robust to prior parameter changes, provided these remain in the range of recommended defaults. In Chapter 8 we have the conclusions and some possible future directions of this work. Finally, in Appendix A we present the proofs of Theorem 1 as well as auxiliary lemmas and corollaries. Appendix B shows the MCMC diagnostics. Appendix C presents the main probability density functions used throughout this thesis.
Supervisor: Not available Sponsor: University of Warwick
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: QA Mathematics