Title:
|
Recording process documentation in the presence of failures in service oriented architectures
|
Scientific and engineering communities (e.g., chemistry, bioinformatics and engineering manufacturing) have presented unprecedented requirements for knowing the provenance of their data products, i.e., where they originated from, how they were produced and what has happened to them since creation. Without such important knowledge, scientists and engineers cannot reproduce, analyse or validate experiments and processes. Previous work has conceived a computer-based representation of a past process for determining provenance, termed process documentation. However, current provenance systems do not adequately address the problem of reliably recording process documentation in large scale environments like Service Oriented Architectures. For example, a service may not be available and network connection may be broken. In this context, reliably recording process documentation becomes challenging, given that the documentation produced in a process can be spread over multiple provenance repositories across the world. The presence of failures (specifically, the crash of provenance repositories and communication failures) may prevent process documentation from being recorded, losing the evidence that a process occurred. This would have disastrous consequences and hence is not acceptable in the domains that rely on process documentation to determine the provenance of their data products. In this thesis, we systematically analyse all situations that may occur during capturing process documentation in the event of assumed failures. We then present a novel coordinator-based protocol that is formally proved to record complete process documentation. In addition, we use graphs to intuitively represent the topology of process documentation recorded in multiple interlinked provenance repositories, which helps us to investigate the entire retrievability of distributed process documentation. Finally, we evaluate a system architecture that employs the protocol and supports practical issues such as communication, storage and performance. The results show that the system can record complete and retrievable process documentation while maintaining acceptable performance.
|