Abstract:
Stencil algorithms are widely used in areas of mathematical modeling on regular grids, the evolution of cellular automata (such as the game “life”), image processing, sequence analysis, etc.
Such algorithms are well parallelized, but the usual approaches to parallelization have low temporal locality, which limits their scalability.
It is possible to get rid of this drawback by using different re-ordering schemes for processing points, when the space is divided into small areas that fit into the cache, in which it is possible to advance by several iterations at once.
However, such programs are difficult to write and debug. There is a simple pyramid method, but it doesn’t scale well due to some duplication of calculations.
Our approach is to use more sophisticated reordering schemes without duplication, for which code can be generated automatically from a relatively simple scheme specification.
In this case, the schemes themselves are defined by specifying distribution functions that distribute the computational points over space and time. This article outlines the approach and considers, for a sample source, various code variants generated from different distribution functions. (In Russian).
Key words and phrases:stencil algorithms, parallel computing, auto-parallelization, temporal locality, cache memory, pyramid method, dataflow computation model, placement, scheduling.