Title:
|
Statistical aspects of persistent homology
|
This thesis investigates statistical approaches to interpreting the output of persistent
homology, a multi-resolution algorithm for discovering topological structure in data.
We provide a brief introduction to the theory of topology and homology. The output is
a set of intervals, visualised either as a 'barcode' or as a set of points called a persistence
diagram. We discuss suitable metrics for persistence diagrams. The following chapter
demonstrates how to compute persistent homology using R.
Following this foundational work, we find a confidence set for the true persistence diagram
of the underlying space using a sample diagram. Such sets aid with the interpretation
of persistence diagrams by identifying points that are likely representative of
true topological features, and those points that are noise due to sampling. We present
two methods of constructing confidence sets. The first assumes that the support of the
sampling density is not too 'spiky'. The second method uses a stronger assumption that
the data are a realisation of a homogeneous Poisson process, which leads to a less conservative
confidence set.
In the middle section of this thesis, we investigate further sampling properties of persistence
diagrams. Sampling on the circle leads us to propose a barcode test of sampling
uniformity. We look at the diagrams of samples from the unit square, which is topologically
simple, and propose these as a model for the noise in diagrams from other
spaces. We propose density corrected persistent homology that makes sample diagrams
less sensitive to the geometry of the underlying space and the sampling density.
In the last section of this thesis, we demonstrate how persistent homology can be used to
identify topological structure in correlation and partial correlation matrices. This relates
to the problem of structure learning in graphical models.
|