Sammendrag
In recent years, there has been a growing focus on how GPUs can be utilized for general purpose computations. However, this leads to less focus on the CPU as a computational resource. As a consequence, heterogeneous computers with both a CPU and a GPU may not utilize all its resources. To address this, I present a heterogeneous CPU and GPU implementation based on a high-resolution explicit scheme for the shallow-water equations on a single GPU (Brodtkorb et al. 2012). I perform two levels of parallelism: First, a row domain decomposition method is used to decompose the computational domain to utilize both the CPU and the GPU in parallel. Secondly, the CPU code is multi-threaded to take advantage of all cores. Furthermore, systems of conservation laws often involve large computational domains where only parts of the domain has to be computed, e.g., water or other fluids. This can lead to imbalance in the workload if the computations between the GPU and the CPU are not balanced. To address this, I present dynamic auto-tuning methods that automatically tune the domain decomposition between the CPU and the GPU during runtime, as well as optimization techniques to skip computations for "dry" domain areas.