Use this URL to cite or link to this record in EThOS:
Title: Scalable multithreaded algorithms for mutable irregular data with application to anisotropic mesh adaptivity
Author: Rokos, Georgios
ISNI:       0000 0004 5349 6919
Awarding Body: Imperial College London
Current Institution: Imperial College London
Date of Award: 2014
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
Anisotropic mesh adaptation is a powerful way to directly minimise the computational cost of mesh based simulation. It is particularly important for multi-scale problems where the required number of floating-point operations can be reduced by orders of magnitude relative to more traditional static mesh approaches. Increasingly, finite element/volume codes are being optimised for modern multicore architectures. Inter-node parallelism for mesh adaptivity has been successfully implemented by a number of groups using domain decomposition methods. However, thread-level parallelism using programming models such as OpenMP is significantly more challenging because the underlying data structures are extensively modified during mesh adaptation and a greater degree of parallelism must be realised while keeping the code race-free. In this thesis we describe a new thread-parallel implementation of four anisotropic mesh adaptation algorithms, namely edge coarsening, element refinement, edge swapping and vertex smoothing. For each of the mesh optimisation phases we describe how safe parallel execution is guaranteed by processing workitems in batches of independent sets and using a deferred-operations strategy to update the mesh data structures in parallel without data contention. Scalable execution is further assisted by creating worklists using atomic operations, which provides a synchronisation-free alternative to reduction-based worklist algorithms. Additionally, we compare graph colouring methods for the creation of independent sets and present an improved version which can run up to 50% faster than existing techniques. Finally, we describe some early work on an interrupt-driven work-sharing for-loop scheduler which is shown to perform better than existing work-stealing schedulers. Combining all aforementioned novel techniques, which are generally applicable to other unordered irregular problems, we show that despite the complex nature of mesh adaptation and inherent load imbalances, we achieve a parallel efficiency of 60% on an 8-core Intel(R) Xeon(R) Sandy Bridge and 40% using 16 cores on a dual-socket Intel(R) Xeon(R) Sandy Bridge ccNUMA system.
Supervisor: Kelly, Paul Sponsor: Engineering and Physical Sciences Research Council
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available