RUS  ENG
Full version
JOURNALS // Program Systems: Theory and Applications // Archive

Program Systems: Theory and Applications, 2020 Volume 11, Issue 3, Pages 33–59 (Mi ps369)

This article is cited in 2 papers

Hardware and Software for Supercomputers

Multiple-precision matrix-vector multiplication on graphics processing units

K. Isupova, V. S. Knyazkovb

a Vyatka State University
b Penza State University

Abstract: We are considering a parallel implementation of matrix-vector multiplication (GEMV, Level 2 of the BLAS) for graphics processing units (GPUs) using multiple-precision arithmetic based on the residue number system. In our GEMV implementation, element-wise operations with multiple-precision vectors and matrices consist of several parts, each of which is calculated by a separate CUDA kernel. This feature eliminates branch divergence when performing sequential parts of multiple-precision operations and allows the full utilization of the GPU’s resources. An efficient data structure for storing arrays with multiple-precision entries provides a coalesced access pattern to the GPU global memory. We have performed a rounding error analysis and derived error bounds for the proposed GEMV implementation. Experimental results show the high efficiency of the proposed solution compared to existing high-precision packages deployed on GPU.

Key words and phrases: multiple-precision computations, BLAS, GEMV, parallel algorithms, CUDA, GPU, residue number system.

UDC: 004.222+004.272.25
BBK: Ç973:Ç972.1

Received: 29.04.2020
24.07.2020

DOI: 10.25209/2079-3316-2020-11-3-33-59


 English version:
, 2020, 11:3, 61–84


© Steklov Math. Inst. of RAS, 2025