Use this URL to cite or link to this record in EThOS:
Title: Smoothed particle hydrodynamics on graphics processing units
Author: McCabe, Christopher
Awarding Body: Manchester Metropolitan University
Current Institution: Manchester Metropolitan University
Date of Award: 2012
Availability of Full Text:
Access through EThOS:
Access through Institution:
A recent development in Computational Fluid Dynamics (CFD) has been the meshless method calledWeakly Compressible Smoothed Particle Hydrodynamics (WCSPH), which is a Lagrangian method that tracks physical quantities of a fluid as it moves in time and space. One disadvantage of WCSPH is the small time steps required due to the use of the weakly compressible Tait equation of state, so large scale simulations using WCSPH have so far been rare and only performed on very expensive CPU-based supercomputers. As CFD simulations grow larger and more detailed, the need to use high performance computing also grows. There is therefore great interest in any computer technology that can provide the equivalent computational power of the CPU-based supercomputer for a fraction of the cost. Hence the excitement aroused in the SPH community by the Graphics Processing Unit (GPU). The GPU offers great potential for providing significant increases in computational performance due to its much smaller size and power consumption relative to the more established and traditional high performance computers comprising hundreds or thousands of CPUs. However, there are some disadvantages in programming GPUs. The memory structure of the GPU is more complex and more variable in speed, and there are other factors that can seriously affect performance, such as the thread grid dimensions which drives the occupancy of the GPU. The aim of this thesis is to describe how WCSPH can be efficiently implemented on multiple GPUs. First, some CFD methods and their success or otherwise in simulating free surfaces are discussed, and examples of previous attempts at implementing CFD algorithms on GPUs are given. The mathematical theory of WCSPH is then presented, followed by a detailed examination of the architecture of a GPU and how to program a GPU. Two different implementations of the same WCSPH algorithm are then described to simulate a well known experiment of a collapse of a column of water to highlight two possible uses of the GPU memory. The first method uses the fast shared memory of the GPU, which is recommended by the GPU manufacturer, while the second method uses the texture i memory of the GPU, which acts as a cache. It is shown that due to the theory of WCSPH, which allows particles to only interact with other particles a short distance apart, that despite the speed of the shared memory and the power of coalescing data into the shared memory, the texture memory method is currently the most efficient, but that this method of implementing WCSPH on a single GPU requires a much higher degree of complexity of programming than the shared memory method. It is also shown that the size of the thread block can have a significant effect on performance. Riemann solvers add more computational effort but can provide more accuracy. The use of Riemann solvers in WCSPH and their success or otherwise is then examined, and the results and performance of one particular WCSPH algorithm that uses an approximate Riemann solver when executed on a GPU are reported. The treatment of boundaries has been and continues to be a problem in WCSPH, and there are a number of creative proposals for boundary treatments. Some of these are described in detail before a new boundary treatment is proposed that builds upon a boundary treatment that was recently proposed, and improves its performance in execution time on a GPU by using the registers and not the slower memories of the GPU. This new boundary treatment builds a unique private grid of boundary particles for each fluid particle close to the boundary. All computation is performed in the registers, the properties of the boundary particles depend on the fluid particle only, and there is no requirement to recall data from the slower global or texture memories of the GPU. The new boundary treatment is also shown to propagate a solitary wave further, preserves the wave height more and takes less execution time to compute than the original boundary treatment this new treatment builds on. A unique and simple implementation of WCSPH on multiple GPUs is then described, and the results of a simulation of a collapse of a column of water in 3D are reported and compared against the results from a simulation of the same problem with the same WCSPH algorithm executed on a large cluster of multi core CPUs. The conclusion is that simulations on a small cluster of GPUs can achieve greater performance than from a cluster of multi core CPUs, but to achieve this the slow GPU memories, including the texture ii memory, must be avoided by using the registers as much as possible, and the architecture of the network linking the GPUs together must be exploited. The former was achieved by using the new boundary treatment proposed in this thesis and discussed above, and the latter was achieved by the use of the MPI Group functionality. The GPUs used for this thesis were already connected together in boxes of 4 by the manufacturer. The cluster used for this thesis consisted of 8 of these boxes, giving a total of 32 GPUs. These boxes of 4 GPUs were connected together through a common host, but the communication speed over the connection between the box and the host is much slower than that between the GPUs inside the box. The total communication time was minimized by grouping the GPUs inside a box together with their private unique MPI communicator, and a communication procedure was created to minimize communication over the relatively slow connection between the boxes of GPUs and the host. Finally, some conclusions are drawn and suggestions for further work are made.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available