Seamless parallel computing on heterogeneous networks of multiprocessor workstations
This thesis is concerned with portable, efficient, and, above all, seamless parallel programming of heterogeneous networks of shared memory multiprocessor workstations. The CSP model of concurrency as embodied in the occam language is used to purvey an architecture-independent and elegant view of concurrent systems. Tools and techniques for efficiently executing finely decomposed parallel programs on uniprocessor workstations, shared memory multiprocessor workstations and networks of both are examined in some detail. In particular, scheduling strategies that batch related processes together to reduce cache-related context switching overheads on uniprocessors, and to reduce contention and false sharing on shared memory multiprocessors are studied. New wait-free CP channel algorithms for shared memory multiprocessors are presented, as well as implementations of CSP channel algorithms across commodity network interconnects. A virtual parallel computer abstraction is applied to hide the inherent heterogeneity of workstation networks and enable seamless execution of parallel programs. An investigation of the performance of moderate to very fine grain parallelism on uniprocessors and shared memory multiprocessors is presented. The performance of CSP channels across TCP/IP networks is also scrutinized. The results indicate that fine grain parallelism can be handled efficiently in software on uniprocessors and shared memory multiprocessors, though issues related to caching warrant careful consideration. Other results also show that a limited amount of computation-communication overlap can be attained even with commodity network adapters which require significant processor interaction to sustain data transfer. This thesis demonstrates that seamless parallel programming across a variety of contemporary architectures using the CSP/occam model is a viable, as well as an attractive, option.