Abstract:
An optimization of communications within multi-level parallelization based on a combination of MPI, OpenMP and OpenCL is proposed to fit all kinds of modern supercomputer architectures including hybrid systems with GPU and Intel Xeon Phi accelerators. A general-purpose scheduler that simplifies a heterogeneous implementation is discussed. The scheduler controls queues of computing and communication OpenCL tasks. It uses an interdependency graph of a target computing algorithm as input. The use of the scheduler is demonstrated on an example of a finite-volume CFD algorithm for unstructured meshes. In particular, the scheduler has been used to simplify an implementation of an overlapped communication scheme. The implementation of the CFD algorithm with MPI and CPU-GPU communications overlapped with computations is described and its parallel efficiency is demonstrated.