Abstract:
We investigated efficiency of the extended precision library QD on the GPU Tesla C1060, GeForce GTX480, and Tesla C2050. We added CUDA algorithms of main linear algebra operations to the library. The algorithms are optimized for maximal performance. In particular, matrix computations with extended precision on Tesla C2050 have about 100x speed-up; this is close to the limit performance since the elementary arithmetic operation have the same rate.