The design of robust protocols for distributed real-time systems
Modern distributed control systems comprise of a set of processors which are interconnected using a suitable communication network. For use in real-time control environments, such systems must be deterministic and generate specified responses within critical timing constraints. Also, they should be sufficiently robust to survive predictable events such as communication or processor faults. This thesis considers the problem of coordinating and synchronizing a distributed real-time control system under normal and abnormal conditions. Distributed control systems need to periodically coordinate the actions of several autonomous sites. Often the type of coordination required is the all or nothing property of an atomic action. Atomic commit protocols have been used to achieve this atomicity in distributed database systems which are not subject to deadlines. This thesis addresses the problem of applying time constraints to atomic commit protocols so that decisions can be made within a deadline. A modified protocol is proposed which is suitable for real-time applications. The thesis also addresses the problem of ensuring that atomicity is provided even if processor or communication failures occur. Previous work has considered the design of atomic commit protocols for use in non time critical distributed database systems. However, in a distributed real-time control system a fault must not allow stringent timing constraints to be violated. This thesis proposes commit protocols using synchronous communications which can be made resilient to a single processor or communication failure and still satisfy deadlines. Previous formal models used to design commit protocols have had adequate state coverability but have omitted timing properties. They also assumed that sites communicated asynchronously and omitted the communications from the model. Timed Petri nets are used in this thesis to specify and design the proposed protocols which are analysed for consistency and timeliness. Also the communication system is mcxielled within the Petri net specifications so that communication failures can be included in the analysis. Analysis of the Timed Petri net and the associated reachability tree is used to show the proposed protocols always terminate consistently and satisfy timing constraints. Finally the applications of this work are described. Two different types of applications are considered, real-time databases and real-time control systems. It is shown that it may be advantageous to use synchronous communications in distributed database systems, especially if predictable response times are required. Emphasis is given to the application of the developed commit protocols to real-time control systems. Using the same analysis techniques as those used for the design of the protocols it can be shown that the overall system performs as expected both functionally and temporally.