Abstract:
The acceleration of iterative algorithms, found in solving problems
of mathematical physics, mathematical modeling, and image processing, is
considered. In the software implementation of these algorithms, there are nested
loops (sections of the program that consist of nested loops). These loop nests can
be accelerated by combination of optimizing transformations, including tiling,
hyperplane method, and parallelization on shared memory. The equivalence of
this combination of program transformations is substantiated.
A method for changing the order of tile traversal is proposed and justified. The
method provides acceleration by increasing data readings from registers instead of
slower memory. Considering this method, a formula for calculating optimal tile
sizes is obtained.
The combination of transformations presented in this article results in an
acceleration that is 1.4 times greater than the well-known optimization algorithm
implemented in PLUTO. In some cases using an 8-core processor, numerical
experiments show a significant increase in speed compared to the original
sequential algorithms. The findings of this article can be applied to manual and
automatic program optimization.
Key words and phrases:tiling, wavefront, parallelization, shared memory, iterative stencil loops.