Title:

The construction of variable length codes with good synchronisation properties

This thesis addresses the problems of robust optimal data compression in circumstances where synchronisation may be lost and error propagation can occur. Some background information on binary variable length codes is given, particularly relating to the problems of compression and synchronisation. A review of the literature on the codes known as Huffman Equivalent codes (HEcodes) and Tcodes, reveals many papers that deal with the synchronisation of these codes. HEcodes have short synchronising codewords forced by their construction, which will occur frequently. Tcodes have good synchronisation capabilities, due to the copy and append nature of the construction algorithm. In chapter 2 a method developed by Takishima, Wada and Murakami is described and a theoretical result for the synchronisation delay after an inversion error has occurred is given. In chapter 3 the construction and properties of Tcodes are described. In chapter 4, the construction of HEcodes is presented and some properties of these codes are given. Three algorithms are devised and discussed that give the existence or nonexistence of certain HEcodes. These codes are HEcodes with shortest synchronising codeword 10m ~ 1 l where 0nodes are extended wherever a choice exists, HEcodes with shortest synchronising codeword 10m1 l where 1nodes are extended wherever a choice exists, and HEcodes with two shortest synchronising codewords 10 m1 l and 10 m . If any of these codes does exist, the relevant algorithm also gives the number of synchronising codewords guaranteed by the code's construction. In chapter 5 a number of new algorithms are given for the construction of a code known as a Hybrid code. This Hybrid code is a specialised Tcode. It combines the synchronisation properties of both HEcodes and Tcodes to produce a code with synchronisation properties at least as good as either, where the form and length of the synchronising codewords in the code are known. In particular, theoretical results are given on codes constructed by two algorithms which show why these are good choices as Tcodes. This is also reinforced by simulation results which show that Tcodes constructed by these algorithms outperform other Tcodes in terms of their synchronisation capabilities. Chapter 6 introduces a new variable length code construction algorithm. The codes constructed are called Ordered Termination codes or OTcodes. OTcodes have the major advantage that they exist for all given source probabilities and are optimal in terms of their average codeword length. Theoretical results are given for the existence of certain synchronising codewords in any particular OTcode, along with a theorem that shows that the OTcodes contain at least as many synchronising codewords with paths that end lO"1 " 1 ! as an HEcode with shortest synchronising codeword lO"1 " 1 ! where 0nodes are extended wherever a choice exists. Also an algorithm is presented that gives the guaranteed number of synchronising codewords with paths that end 10^1 where /3e{ml,...,al} and a is the first level at which a node ending 0 must be terminated. Finally in this chapter, discussions are given on other factors that affect the synchronisation of the code. The results given in this chapter, along with the discussions, suggest that the synchronisation properties of the code should be "good". In chapter 7 theoretical results are given comparing OTcodes with certain codes in the literature. Theoretical and simulation results are also given comparing OTcodes, Huffman codes, HEcodes and Tcodes (Hybrid codes). Large text samples are used to carry out these simulations. The results indicate that OTcodes are either significantly better than, or comparable with, the other codes. In chapter 8, conclusions and proposals for future work are presented. The conclusions emphasise that the performance of OTcodes is complemented by the advantage that they always exist, and by the relative simplicity of their construction. The future work proposed is mainly concerned with analysing how the OTcodes behave when insertion and deletion errors are introduced, and also when a combination of both insertion/deletion errors and inversion errors are present. Further work on guaranteed synchronising codewords and on other factors affecting synchronisation performance is also proposed.
