Use this URL to cite or link to this record in EThOS:
Title: Statistical methods for monitoring multiple data streams
Author: Lau, Fatih Din-Houn
ISNI:       0000 0004 5348 9911
Awarding Body: Imperial College London
Current Institution: Imperial College London
Date of Award: 2014
Availability of Full Text:
Access from EThOS:
Access from Institution:
This thesis develops new methods to monitor multiple data streams and report some quantity of interest over time. We consider two types of settings. First, we consider a data stream as realisations from a sequence of independent random variables that are revealed over time. To monitor the individual streams, we propose a new type of control chart, based on the cumulative sum chart. Cumulative sum charts are typically used to detect a change in the distribution of a sequence of observations, e.g., shifts in the mean. Usually, after signalling, the chart is restarted by setting it to some value below the signalling threshold. We propose a non-restarting cumulative sum chart which is able to detect periods during which the stream is out of control. Further, we advocate an upper boundary to prevent the cumulative sum chart rising too high, which helps to detect a change back into control. We prove that the non-restarting charts are optimal, in a well-defined sense. Further, we investigate the performance of these charts when the upper boundary is varied. Simulation results show a trade-off between the height of the upper boundary of the chart and the false signal rate. We then present an algorithm to control the false discovery rate across multiple data streams using the non-restarting charts. We consider two definitions of a false discovery: signalling out-of-control when the observations have been in-control since the start and signalling out-of-control when the observations have been in-control since the last time the chart was at zero. We prove that the false discovery rate is controlled under both these definitions simultaneously. Simulations reveal the difference in false discovery rate control when using these and other desirable definitions of a false discovery. In the second setting, a data stream is considered as observations of a Bayesian model revealed over time. The aim is to report a posterior summary of interest quickly and within a user-specified degree of accuracy. A system is presented to tackle such problems. The estimates are calculated using weighted samples stored in a database. The stored samples are maintained such that the accuracy of the estimates and quality of the samples is satisfactory. This maintenance involves varying the number of samples in the database and updating their weights. New samples are generated, when required, by a Markov chain Monte Carlo algorithm. The system is demonstrated using a football league model that is used to predict the end of season table. Correctness of the estimates and their accuracy is shown in a simulation using a linear Gaussian model. Lastly, potential improvements of the system are investigated. A series of motivating simulations illustrate some potential problems of the system. Remedial solutions are suggested, with a view toward implementation in the near future.
Supervisor: Gandy, Axel Sponsor: Engineering and Physical Sciences Research Council
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral