Preventing state divergence in replicated distributed systems
N-Modular Redundancy (NMR) is a form of active replication in whicheach processor is replicated to form a node and each processor replica within the node executes the same set of software component replicas. Communication between nodes, in the form of messages, passes through a voting mechanism by which processor failures are masked. When the degree of replication is three, the technique is known as Triple Modular Redundancy (TMR) and can tolerate the failure of a single node processor. For voting to be successful, non-faulty software component replicas must output identical messages in an identical order. If we assume that software components are deterministic, then we need only ensure that the replicas process identical input messages in an identical order. Such software components conform to the well understood and researched state machine model of active replication. However, most distributed programs employ mechanisms not incorporated in the state machine model such as timeouts and prioritized messages. These potential sources of non-determinism could lead to a divergence of state among software component replicas which could then produce inconsistent responses to identical input messages, thereby defeating the NMR voting mechanism. The main contributions of this thesis are: (i) To present an architecture for active replicated processing which maybe applied to any distributed system. (ii) To present a more expressive, enhanced model for software components which incorporates non-determinism and show how a system of such software components may be replicated, using a single well-defined generic mechanism (the order process) to prevent state divergence. Since the problem of identical ordering can be formulated as the interactive consistency problem which is solvable in the presence of arbitrary (Byzantine) failures, the approach presented in this thesis, unlike any other published to date, is capable of tolerating such failures.