Abstract:
Dataflow processor is able to issue up to 16 instructions per clock in contrary to 4–6 instructions per clock for best von-Neumann processor design. Simulation of our vector dataflow processor shows that matrix multiplication performance reaches 256 flops per clock on less then eight instructions per clock issue and can keep almost peak performance on much smaller matrix dimensions compared to traditional processor. Advantages and disadvantages of floating point fused multiply-add execution units are also analyzed when using in our vector dataflow processor design. (In Russian).
Key words and phrases:supercomputer; vector processor; dataflow architecture; performance evaluation; fine grained parallelism; fused multiply-adders.