Use this URL to cite or link to this record in EThOS:
Title: A formal treatment of lossless data compression algorithms
Author: Stratford, Barney
ISNI:       0000 0001 3488 0972
Awarding Body: University of Oxford
Current Institution: University of Oxford
Date of Award: 2005
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
Since its inception, data compression has been practised mostly as an experimental science. Although this thesis continues that trend to some extent, its main emphasis is on formal derivations of compression algorithms and proofs of their correctness. Such a mathematical approach has not been taken before, and we have found that it has yielded significant dividends in the form of faster compression. Modern compression schemes can be viewed as a combination of a statistical model and an entropy coder. The method developed in this thesis consists of a PPM (Prediction by Partial Matching) model with arithmetic coding. Other methods, including dictionary-based algorithms and the Burrows-Wheeler Transform, are discussed briefly. Arithmetic coding can be seen as a generalisation of Huffman coding, with fewer restrictions and better compression. However, the algorithm is quite difficult to analyse and understand, and it's very easy to make a mistake that would render the program incorrect. Our formal approach simplifies the explanation, and gives us confidence in the final software. Prediction by Partial Matching (PPM) represents the state of the art in statistical modelling. Many of its variants outperform all other known methods. Its major drawback is that it is very slow, and requires temporary storage space linear in the size of the input. A number of the design decisions, while intuitively sensible, are not backed up by any theory. We aimed to justify our version of PPM by using Bayesian statistics. Although this approach did not entirely succeed, there was some significant progress towards the target. Our derivations are carried out using notation drawn from the functional programming language Haskell. Haskell provides a number of advantages over the more traditional imperative languages, although all programs are given in C in an appendix.
Supervisor: Bird, Richard Sponsor: Engineering and Physical Sciences Research Council
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available