Abstract:
One of the most commonly used on practice all-pairs shortest paths algorithms on weighted graphs is Floyd – Warshall algorithm. Blocked version serves as a basis for obtaining effective parallel algorithms to be implemented on multicore central processing units, on computers with distributed memory, on graphics processing units (GPU). Increasing computation granularity in blocked versions of algorithm leads to a more efficient usage of caches and more efficient organization of parallel computations. In this paper we introduce generalization of blocked Floyd – Warshall algorithm. Computing blocks execution order was reorganized in such a way that array elements which participate in communication operations, both reading and writing, are pushed out of fast-access memory less often. This means that in GPU implementation slow global memory is used less often, compared with the original blocked algorithm.