Abstract:
Breadth-First Search (BFS) is one of the most important kernels in graph computing. It is the main kernel of the Graph500 rating that evaluates performance of large supercomputers and multiprocessor nodes in terms of traversed edges per second (TEPS). In this paper we present the results of BFS performance evaluation on a recently released high-performance Intel Xeon Phi coprocessor. We examine the previously proposed Queue-based and Read-based approaches to BFS implementation. We also apply several optimization techniques, such as manual loop unrolling and prefetching, that significantly improve performance on Intel Xeon Phi. On a representative graph set, Intel Xeon Phi 7120P demonstrates 78% maximum and 37% average speedup as compared to the Intel Xeon E5-2660 processor. We achieve 4366 MTEPS on Intel Xeon Phi 7120P for a graph with scale 25, and have 89th place on the November 2013 Graph500 list. This is the fourth place among research teams in the class of single node x86-based systems. The authors would like to thank the Svet Computers company for the provided IntellectDigital SciPhi 470 desktop supercomputer with Intel Xeon Phi 7120P coprocessor.