Abstract:
The parallel performance of loops with 2-dimension arrays, particularly matrix multiplication,
is considered. The most effective algorithm of matrix multiplication for
distributed memory supercomputers that requires data dependence analyses is designed.
The method of automatic matrix distribution in a memory of each processor and
processor communications are described. The program for matrix multiplication detection
and replacing by parallel form is considered. There are some results of parallel program
performance on supercomputer nCube 2S in this paper.