Global optimisation of communication protocols for bulk synchronous parallel computation
In the Bulk Synchronous Parallel (or BSP) model of parallel communication represented by BSPlib, the relaxed coupling of the global computation, communication and synchronisation, whilst providing a definite semantics, does not prescribe exactly when and where communication is to be carried out during the computation. It merely states that it cannot happen before requested by the application and that at certain points local computation cannot proceed unless updates have been applied from the other participating processors. The nature of the computation and this framework is open to exploitation by the implementation of the runtime system and can be made to suit particular physical environments without requiring application program changes. This bulk and global view of parallel computation can be used to implement protocols that both maintain and take into account global state for optimising performance. Such global protocols can provide performance improvements which are not easily achieved with local and greedy strategies and may in turn be locally sub-optimal. This global perspective and the exploitable nature of BSP computation is applied to congestion avoidance, transport layer protocols suitable for BSP computation, global stable check-pointing, and work process placement and migration, to achieve a better overall performance. An important consideration for the compositionality of parallel computer systems into larger systems is that in order for the composite to exhibit good performance, the individual components must also do so. However, it is not obvious how the individual components contribute to the global performance. Already mentioned is that non-locally optimal strategies might lead to globally optimal performance, but also of importance is that variance observed at the local level also influences performance. A number of decisions in the transport protocol design and implementations have been made in order that the observed variance in the protocol's behaviour is minimised. It is demonstrated why this is required using the BSP model. The analysis also suggests a regression technique which can be applied to sampled global performance data.