Fault-tolerant group communication protocols for asynchronous systems
It is widely accepted that group communication (multicast) is a powerful abstraction that can be used whenever a collection of distributed processes cooperate to achieve a common goal such as load-sharing or fault-tolerance. Due to the uncertainties inherent to distributed systems (emerging from communication and/or process failures), group communication protocols have to face situations where, for instance, a sender process fails when a multicast is underway or where messages from different senders arrive in an inconsistent order at different destination processes. Further complications arise if processes belong to multiple groups. In this thesis, we make use of logical clocks [Lamport78] to develop the concept of Causal Blocks. We show that Causal Blocks provide a concise method for deducing ordering relationships between messages exchanged by processes of a group, resulting in simple methods for dealing with multiple groups. Based on the Causal Blocks representation, we present a protocol for total order message delivery which has constant and low message space overhead (Le. the protocol related information contained in a multicast message is small). We also present causal order protocols with different trade-offs between message space overhead and speed of message delivery. Furthermore, we show how the Causal Blocks representation can be used to easily deduce and maintain reliability information. Our protocols are faulttolerant: ordering and liveness are preserved even if group membership changes occur (due to failures such as process crashes or network partitions). The total order protocol, together with a novel flow control mechanism, has been implemented over a set of networked Unix workstations, and experiments carried out to analyse its performance in varied group configurations.