Parallel algorithms for numerical linear algebra on a shared memory multiprocessor
This thesis discusses a variety of parallel algorithms for linear algebra problems including the solution of the linear system of equations Ax = b using QR and L U decomposition, reduction of a general matrix A to Hessenberg form, reduction of a real symmetric matrix B to tridiagonal form, and solution of the symmetric tridiagonal eigenproblem. Empirical comparisons are carried out using various different versions of the above algorithms and this is described in this thesis. We also compare three different synchronisation mechanisms when applied to the reduction to Hessenberg form problem. We implement Cuppen's method for computing both eigenvalues and eigenvectors of a real symmetric tridiagonal matrix T using both recursive and non-recursive implementations. We consider parallel implementations of these versions and also consider parallelisation of the matrix multiplication part of the algorithm. We present some numerical results illustrating an experimental evaluation of the effect of deflation on accuracy, comparison of the parallel implementations and comparison of the additional parallelisation for matrix multiplication. 11 A variety of algorithms are investigated which involve varying amounts of overlap between different parts of the calculation and collecting together updates as far as possible to make good use of the storage hierarchy of the shared memory multiprocessor. Algorithms using dynamic task allocation are compared with ones which do not. The results presented have been obtained using the C++ programming language, with parallel constructs provided by the Encore Parallel Threads package on a shared memory Encore Multimax (MIMD) computer. The experimental results demonstrate that dynamic task allocation can be sometimes very effective on this machine, and that very high efficiency is often obtainable with careful construction of the parallel algorithms even for relatively small matrices.