Use this URL to cite or link to this record in EThOS:
Title: Bayesian estimation of luminosity distributions and model based classification of astrophysical sources
Author: Stampoulis, Vasileios
ISNI:       0000 0004 7223 4789
Awarding Body: Imperial College London
Current Institution: Imperial College London
Date of Award: 2017
Availability of Full Text:
Access from EThOS:
Access from Institution:
The distribution of the flux (observed luminosity) of astrophysical objects is of great interest as a measure of the evolution of various types of astronomical source populations and for testing theoretical assumptions about the Universe. This distribution is examined using the cumulative distribution of the number of sources (N) detected at a given flux (S), known as the log(N)−log(S) curve to astronomers. Estimating the log(N) − log(S) curve from observational data can be quite challenging though, since statistical fluctuations in the measurements and detector biases often lead to measurement uncertainties. Moreover, the location of the source with respect to the centre of observation and the background contamination can lead to non-detection of sources (missing data). This phenomenon becomes more apparent for low flux objects, thus indicating that the missing data mechanism is non-ignorable. In order to avoid inferential biases, it is vital that the different sources of uncertainties, po- tential bias and missing data mechanism be properly accounted for. However, the majority of the methods in the relevant literature for estimating the log(N)−log(S) curve are based on the assumption of complete surveys with non missing data. In this thesis, we present a Bayesian hierarchical model that properly accounts for the missing data mechanism and the other sources of uncertainty. More specifically, we model the joint distribution of the complete data and model parameters and then derive the posterior distribution of the model parameters marginalised across all missing data information. We utilise a Blocked Gibbs sampler in order to extract samples from the joint posterior distribution of the parameters of interest. By using a Bayesian approach, we produce a posterior distribution for the log(N) − log(S) curve instead of a best-fit estimate. We apply this method to the Chandra Deep Field South (CDFS) dataset. Furthermore, approaching this complicated problem from a fully Bayesian angle enables us to appropriately model the uncertainty about the conversion factor between observed source photon counts and observed luminosity. Using relevant spectral data for the observed sources, the uncertainty about the flux-to-count conversion factor γ for each observed source is expressed through MCMC draws from the posterior distribution of γ for each source. In order to account for this uncertainty in the non- detected sources, we develop a novel statistical approach for fitting a hierarchical prior on the flux-to-count conversion factor based on the MCMC samples from the observed sources (a statistical approach that can be used in many modelling prob- lems of similar nature). We derive in a similar manner the posterior distribution of the model parameters, marginalised across the missing data, and we explore the impact in our posterior estimates of the parameters of interest in the CDFS dataset. Studying the log(N) − log(S) relationship for different source populations can give us further insight into the differences between the various types of astronomical pop- ulations. Hence, we propose a new soft-clustering scheme for classifying galaxies in different activity classes (Star Forming Galaxies, LINERs, Seyferts and Composites) using simultaneously 4 optical emission-line ratios ([NII]/Hα, [SII]/Hα, [OI]/Hα and [OIII]/Hβ). The most widely used classification approach is based on 3 diagnostic diagrams, which are 2-dimensional projections of those emission line ratios. Those diagnostics assume fixed classification boundaries, which are developed through theoretical models. However, the use of multiple diagnostic diagrams independently of one another often gives contradicting classifications for the same galaxy, and the fact that those diagrams are 2-dimensional projections of a complex multi-dimensional space is limiting the power of those diagnostics. In contrast, we present a data- driven soft clustering scheme that estimates the posterior probability of each galaxy belonging to each activity class. More specifically, we fit a large number of multivariate Gaussian distributions to the Sloan Digital Sky Survey (SDSS) dataset in order to capture local structures and subsequently group the multivariate Gaussian distributions to represent the complex multi-dimensional structure of the joint distribution of the 4 galaxy activity classes. Finally, we discuss how this soft-clustering can lead to estimates of population-specific log(N) − log(S) relationships.
Supervisor: van Dyk, David Sponsor: Imperial College London ; European Commission
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral