An error recovery scheme for concurrent processes
With the more widespread use of multi- processors and distributed computing systems, programmers need a simple, reliable interface to them. This thesis describes language constructs, and mechanisms for their support, that can be used in the implementation of fault-tolerant concurrent processes. The basic language structure is the Atomic Action, supported by a modified recovery cache mechanism. This combines the collection of recovery data with the locking of resources and allows recovery blocks to be integrated with Atomic Actions. Synchronisation between actions is discussed, as well as a means of detecting and breaking deadlocks, based on the use of a "blocking graph". Reliable communication and cooperation between actions is considered, and several constructs are investigated. The limitations of Shared Atomic Actions are identified, and, further, the use of a form of reliable "secretary" is shown to lead to unnecessary recovery activity. These problems are resolved by structures based on a classification of resources by the way they are used in programs. Also contained in the thesis are descriptions of trial implementations of some of the mechanisms described, and a discussion of existing concurrent programming techniques.